Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-25 Thread Robin Murphy

On 2023-09-25 14:29, Jason Gunthorpe wrote:

On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote:

On 2023-09-23 00:33, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
either; it sets it once it's discovered any instance, since apparently it's
assuming that all instances must support identical page sizes, and thus once
it's seen one it can work "normally" per the core code's assumptions. It's
also I think the only driver which has a "finalise" bodge but *can* still
properly support map-before-attach, by virtue of having to replay mappings
to every new endpoint anyway.


Well it can't quite do that since it doesn't know the geometry - it
all is sort of guessing and hoping it doesn't explode on replay. If it
knows the geometry it wouldn't need finalize...


I think it's entirely reasonable to assume that any direct mappings
specified for a device are valid for that device and its IOMMU. However, in
the particular case of virtio, it really shouldn't ever have direct mappings
anyway, since even if the underlying hardware did have any, the host can
enforce the actual direct-mapping aspect itself, and just present them as
unusable regions to the guest.


I assume this machinery is for the ARM GIC ITS page


Again, that's irrelevant. It can only be about whether the actual
->map_pages call succeeds or not. A driver could well know up-front that all
instances support the same pgsize_bitmap and aperture, and set both at
->domain_alloc time, yet still be unable to handle an actual mapping without
knowing which instance(s) that needs to interact with (e.g. omap-iommu).


I think this is a different issue. The domain is supposed to represent
the actual io pte storage, and the storage is supposed to exist even
when the domain is not attached to anything.

As we said with tegra-gart, it is a bug in the driver if all the
mappings disappear when the last device is detached from the domain.
Driver bugs like this turn into significant issues with vfio/iommufd
as this will result in warn_on's and memory leaking.

So, I disagree that this is something we should be allowing in the API
design. map_pages should succeed (memory allocation failures aside) if
a IOVA within the aperture and valid flags are presented. Regardless
of the attachment status. Calling map_pages with an IOVA outside the
aperture should be a caller bug.

It looks omap is just mis-designed to store the pgd in the omap_iommu,
not the omap_iommu_domain :( pgd is clearly a per-domain object in our
API. And why does every instance need its own copy of the identical
pgd?


The point wasn't that it was necessarily a good and justifiable example, 
just that it is one that exists, to demonstrate that in general we have 
no reasonable heuristic for guessing whether ->map_pages is going to 
succeed or not other than by calling it and seeing if it succeeds or 
not. And IMO it's a complete waste of time thinking about ways to make 
such a heuristic possible instead of just getting on with fixing 
iommu_domain_alloc() to make the problem disappear altogether. Once 
Joerg pushes out the current queue I'll rebase and resend v4 of the bus 
ops removal, then hopefully get back to despairing at the hideous pile 
of WIP iommu_domain_alloc() patches I currently have on top of it...


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-25 Thread Robin Murphy

On 2023-09-23 00:33, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote:


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings
either; it sets it once it's discovered any instance, since apparently it's
assuming that all instances must support identical page sizes, and thus once
it's seen one it can work "normally" per the core code's assumptions. It's
also I think the only driver which has a "finalise" bodge but *can* still
properly support map-before-attach, by virtue of having to replay mappings
to every new endpoint anyway.


Well it can't quite do that since it doesn't know the geometry - it
all is sort of guessing and hoping it doesn't explode on replay. If it
knows the geometry it wouldn't need finalize...


I think it's entirely reasonable to assume that any direct mappings 
specified for a device are valid for that device and its IOMMU. However, 
in the particular case of virtio, it really shouldn't ever have direct 
mappings anyway, since even if the underlying hardware did have any, the 
host can enforce the actual direct-mapping aspect itself, and just 
present them as unusable regions to the guest.



What do you think about something like this to replace
iommu_create_device_direct_mappings(), that does enforce things
properly?


I fail to see how that would make any practical difference. Either the
mappings can be correctly set up in a pagetable *before* the relevant device
is attached to that pagetable, or they can't (if the driver doesn't have
enough information to be able to do so) and we just have to really hope
nothing blows up in the race window between attaching the device to an empty
pagetable and having a second try at iommu_create_device_direct_mappings().
That's a driver-level issue and has nothing to do with pgsize_bitmap either
way.


Except we don't detect this in the core code correctly, that is my
point. We should detect the aperture conflict, not pgsize_bitmap to
check if it is the first or second try.


Again, that's irrelevant. It can only be about whether the actual 
->map_pages call succeeds or not. A driver could well know up-front that 
all instances support the same pgsize_bitmap and aperture, and set both 
at ->domain_alloc time, yet still be unable to handle an actual mapping 
without knowing which instance(s) that needs to interact with (e.g. 
omap-iommu).


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Robin Murphy

On 22/09/2023 5:27 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote:

On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:

They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is
finalized.


Where? The above points to iommu_create_device_direct_mappings() but
it doesn't because the pgsize_bitmap == 0:


__iommu_domain_alloc() sets pgsize_bitmap in this case:

  /*
   * If not already set, assume all sizes by default; the driver
   * may override this later
   */
  if (!domain->pgsize_bitmap)
  domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;


Dirver's shouldn't do that.

The core code was fixed to try again with mapping reserved regions to
support these kinds of drivers.


This is still the "normal" code path, really; I think it's only AMD that
started initialising the domain bitmap "early" and warranted making it
conditional.


My main point was that iommu_create_device_direct_mappings() should
fail for unfinalized domains, setting pgsize_bitmap to allow it to
succeed is not a nice hack, and not necessary now.


Sure, but it's the whole "unfinalised domains" and rewriting 
domain->pgsize_bitmap after attach thing that is itself the massive 
hack. AMD doesn't do that, and doesn't need to; it knows the appropriate 
format at allocation time and can quite happily return a fully working 
domain which allows map before attach, but the old ops->pgsize_bitmap 
mechanism fundamentally doesn't work for multiple formats with different 
page sizes. The only thing I'd accuse it of doing wrong is the weird 
half-and-half thing of having one format as a default via one mechanism, 
and the other as an override through the other, rather than setting both 
explicitly.


virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings 
either; it sets it once it's discovered any instance, since apparently 
it's assuming that all instances must support identical page sizes, and 
thus once it's seen one it can work "normally" per the core code's 
assumptions. It's also I think the only driver which has a "finalise" 
bodge but *can* still properly support map-before-attach, by virtue of 
having to replay mappings to every new endpoint anyway.



What do you think about something like this to replace
iommu_create_device_direct_mappings(), that does enforce things
properly?


I fail to see how that would make any practical difference. Either the 
mappings can be correctly set up in a pagetable *before* the relevant 
device is attached to that pagetable, or they can't (if the driver 
doesn't have enough information to be able to do so) and we just have to 
really hope nothing blows up in the race window between attaching the 
device to an empty pagetable and having a second try at 
iommu_create_device_direct_mappings(). That's a driver-level issue and 
has nothing to do with pgsize_bitmap either way.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-22 Thread Robin Murphy

On 22/09/2023 1:41 pm, Jason Gunthorpe wrote:

On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote:

They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is
finalized.


Where? The above points to iommu_create_device_direct_mappings() but
it doesn't because the pgsize_bitmap == 0:


__iommu_domain_alloc() sets pgsize_bitmap in this case:

 /*
  * If not already set, assume all sizes by default; the driver
  * may override this later
  */
 if (!domain->pgsize_bitmap)
 domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;


Dirver's shouldn't do that.

The core code was fixed to try again with mapping reserved regions to
support these kinds of drivers.


This is still the "normal" code path, really; I think it's only AMD that 
started initialising the domain bitmap "early" and warranted making it 
conditional. However we *do* ultimately want all the drivers to do the 
same, so we can get rid of ops->pgsize_bitmap, because it's already 
pretty redundant and meaningless in the face of per-domain pagetable 
formats.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-19 Thread Robin Murphy

On 2023-09-19 09:15, Jean-Philippe Brucker wrote:

On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote:

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..3649586f0e5c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
int ret;
unsigned long flags;
+   /*
+* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
+* is initialized e.g. via iommu_create_device_direct_mappings()
+*/
+   if (!viommu)
+   return 0;


Minor nit: I'd be inclined to make that check explicitly in the places where
it definitely is expected, rather than allowing *any* sync to silently do
nothing if called incorrectly. Plus then they could use
vdomain->nr_endpoints for consistency with the equivalent checks elsewhere
(it did take me a moment to figure out how we could get to .iotlb_sync_map
with a NULL viommu without viommu_map_pages() blowing up first...)


They're not strictly equivalent: this check works around a temporary issue
with the IOMMU core, which calls map/unmap before the domain is finalized.
Once we merge domain_alloc() and finalize(), then this check disappears,
but we still need to test nr_endpoints in map/unmap to handle detached
domains (and we still need to fix the synchronization of nr_endpoints
against attach/detach). That's why I preferred doing this on viommu and
keeping it in one place.


Fair enough - it just seems to me that in both cases it's a detached 
domain, so its previous history of whether it's ever been otherwise or 
not shouldn't matter. Even once viommu is initialised, does it really 
make sense to send sync commands for a mapping on a detached domain 
where we haven't actually sent any map/unmap commands?


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map

2023-09-18 Thread Robin Murphy

On 2023-09-18 12:51, Niklas Schnelle wrote:

Pull out the sync operation from viommu_map_pages() by implementing
ops->iotlb_sync_map. This allows the common IOMMU code to map multiple
elements of an sg with a single sync (see iommu_map_sg()). Furthermore,
it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH.


Is it really a requirement? Deferred flush only deals with unmapping. Or 
are you just trying to say that it's not too worthwhile to try doing 
more for unmapping performance while obvious mapping performance is 
still left on the table?



Link: 
https://lore.kernel.org/lkml/20230726111433.1105665-1-schne...@linux.ibm.com/
Signed-off-by: Niklas Schnelle 
---
  drivers/iommu/virtio-iommu.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 17dcd826f5c2..3649586f0e5c 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu)
int ret;
unsigned long flags;
  
+	/*

+* .iotlb_sync_map and .flush_iotlb_all may be called before the viommu
+* is initialized e.g. via iommu_create_device_direct_mappings()
+*/
+   if (!viommu)
+   return 0;


Minor nit: I'd be inclined to make that check explicitly in the places 
where it definitely is expected, rather than allowing *any* sync to 
silently do nothing if called incorrectly. Plus then they could use 
vdomain->nr_endpoints for consistency with the equivalent checks 
elsewhere (it did take me a moment to figure out how we could get to 
.iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up 
first...)


Thanks,
Robin.


spin_lock_irqsave(>request_lock, flags);
ret = __viommu_sync_req(viommu);
if (ret)
@@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, 
unsigned long iova,
.flags  = cpu_to_le32(flags),
};
  
-		ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));

+   ret = viommu_add_req(vdomain->viommu, , sizeof(map));
if (ret) {
viommu_del_mappings(vdomain, iova, end);
return ret;
@@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain,
viommu_sync_req(vdomain->viommu);
  }
  
+static int viommu_iotlb_sync_map(struct iommu_domain *domain,

+unsigned long iova, size_t size)
+{
+   struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+   return viommu_sync_req(vdomain->viommu);
+}
+
  static void viommu_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
@@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = {
.unmap_pages= viommu_unmap_pages,
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
+   .iotlb_sync_map = viommu_iotlb_sync_map,
.free   = viommu_domain_free,
}
  };


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush

2023-09-04 Thread Robin Murphy

On 2023-09-04 16:34, Jean-Philippe Brucker wrote:

On Fri, Aug 25, 2023 at 05:21:26PM +0200, Niklas Schnelle wrote:

Add ops->flush_iotlb_all operation to enable virtio-iommu for the
dma-iommu deferred flush scheme. This results inn a significant increase


in


in performance in exchange for a window in which devices can still
access previously IOMMU mapped memory. To get back to the prior behavior
iommu.strict=1 may be set on the kernel command line.


Maybe add that it depends on CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT} as
well, because I've seen kernel configs that enable either.


Indeed, I'd be inclined phrase it in terms of the driver now actually 
being able to honour lazy mode when requested (which happens to be the 
default on x86), rather than as if it might be some 
potentially-unexpected change in behaviour.


Thanks,
Robin.


Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
Signed-off-by: Niklas Schnelle 
---
  drivers/iommu/virtio-iommu.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index fb73dec5b953..1b7526494490 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -924,6 +924,15 @@ static int viommu_iotlb_sync_map(struct iommu_domain 
*domain,
return viommu_sync_req(vdomain->viommu);
  }
  
+static void viommu_flush_iotlb_all(struct iommu_domain *domain)

+{
+   struct viommu_domain *vdomain = to_viommu_domain(domain);
+
+   if (!vdomain->nr_endpoints)
+   return;


As for patch 1, a NULL check in viommu_sync_req() would allow dropping
this one

Thanks,
Jean


+   viommu_sync_req(vdomain->viommu);
+}
+
  static void viommu_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
struct iommu_resv_region *entry, *new_entry, *msi = NULL;
@@ -1049,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum 
iommu_cap cap)
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
+   case IOMMU_CAP_DEFERRED_FLUSH:
+   return true;
default:
return false;
}
@@ -1069,6 +1080,7 @@ static struct iommu_ops viommu_ops = {
.map_pages  = viommu_map_pages,
.unmap_pages= viommu_unmap_pages,
.iova_to_phys   = viommu_iova_to_phys,
+   .flush_iotlb_all= viommu_flush_iotlb_all,
.iotlb_sync = viommu_iotlb_sync,
.iotlb_sync_map = viommu_iotlb_sync_map,
.free   = viommu_domain_free,

--
2.39.2


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu: Explicitly include correct DT includes

2023-08-07 Thread Robin Murphy

On 14/07/2023 6:46 pm, Rob Herring wrote:

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.


Thanks Rob; FWIW,

Acked-by: Robin Murphy 

I guess you're hoping for Joerg to pick this up? However I wouldn't 
foresee any major conflicts if you do need to take it through the OF tree.


Cheers,
Robin.


Signed-off-by: Rob Herring 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c | 2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c| 1 -
  drivers/iommu/arm/arm-smmu/qcom_iommu.c  | 3 +--
  drivers/iommu/ipmmu-vmsa.c   | 1 -
  drivers/iommu/sprd-iommu.c   | 1 +
  drivers/iommu/tegra-smmu.c   | 2 +-
  drivers/iommu/virtio-iommu.c | 2 +-
  7 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
index b5b14108e086..bb89d49adf8d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
@@ -3,7 +3,7 @@
   * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
   */
  
-#include 

+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index a86acd76c1df..d6d1a2a55cc0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -29,7 +29,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index a503ed758ec3..cc3f68a3516c 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -22,8 +22,7 @@
  #include 
  #include 
  #include 
-#include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9f64c5c9f5b9..0aeedd3e1494 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -17,7 +17,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 39e34fdeccda..51144c232474 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c

index 1cbf063ccf14..e445f80d0226 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -9,7 +9,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3551ed057774..17dcd826f5c2 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -13,7 +13,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 
  #include 
  #include 
  #include 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()

2023-01-20 Thread Robin Murphy

On 2023-01-18 18:00, Jason Gunthorpe wrote:

Change the sg_alloc_table_from_pages() allocation that was hardwired to
GFP_KERNEL to use the gfp parameter like the other allocations in this
function.

Auditing says this is never called from an atomic context, so it is safe
as is, but reads wrong.


I think the point may have been that the sgtable metadata is a 
logically-distinct allocation from the buffer pages themselves. Much 
like the allocation of the pages array itself further down in 
__iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic 
to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, 
allocating implementation-internal metadata with all the same 
constraints as a DMA buffer has just as much smell of wrong about it IMO.


I'd say the more confusing thing about this particular context is why 
we're using iommu_map_sg_atomic() further down - that seems to have been 
an oversight in 781ca2de89ba, since this particular path has never 
supported being called in atomic context.


Overall I'm starting to wonder if it might not be better to stick a "use 
GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of 
the API internals to pick up as appropriate, rather than propagate 
per-call gfp flags everywhere. As it stands we're still missing 
potential pagetable and other domain-related allocations by drivers in 
.attach_dev and even (in probably-shouldn't-really-happen cases) 
.unmap_pages...


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8c2788633c1766..e4bf1bb159f7c7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
if (!iova)
goto out_free_pages;
  
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))

+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp))
goto out_free_iova;
  
  	if (!(ioprot & IOMMU_CACHE)) {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/8] iommu: Add a gfp parameter to iommu_map()

2023-01-06 Thread Robin Murphy

On 2023-01-06 16:42, Jason Gunthorpe wrote:

The internal mechanisms support this, but instead of exposting the gfp to
the caller it wrappers it into iommu_map() and iommu_map_atomic()

Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT.


FWIW, since we *do* have two variants already, I think I'd have a mild 
preference for leaving the regular map calls as-is (i.e. implicit 
GFP_KERNEL), and just generalising the _atomic versions for the special 
cases.


However, echoing the recent activity over on the DMA API side of things, 
I think it's still worth proactively constraining the set of permissible 
flags, lest we end up with more weird problems if stuff that doesn't 
really make sense, like GFP_COMP or zone flags, manages to leak through 
(that may have been part of the reason for having the current wrappers 
rather than a bare gfp argument in the first place, I forget now).


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  arch/arm/mm/dma-mapping.c   | 11 +++
  .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c |  3 ++-
  drivers/gpu/drm/tegra/drm.c |  2 +-
  drivers/gpu/host1x/cdma.c   |  2 +-
  drivers/infiniband/hw/usnic/usnic_uiom.c|  4 ++--
  drivers/iommu/dma-iommu.c   |  2 +-
  drivers/iommu/iommu.c   | 17 ++---
  drivers/iommu/iommufd/pages.c   |  6 --
  drivers/media/platform/qcom/venus/firmware.c|  2 +-
  drivers/net/ipa/ipa_mem.c   |  6 --
  drivers/net/wireless/ath/ath10k/snoc.c  |  2 +-
  drivers/net/wireless/ath/ath11k/ahb.c   |  4 ++--
  drivers/remoteproc/remoteproc_core.c|  5 +++--
  drivers/vfio/vfio_iommu_type1.c |  9 +
  drivers/vhost/vdpa.c|  2 +-
  include/linux/iommu.h   |  4 ++--
  16 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c135f6e37a00ca..8bc01071474ab7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -984,7 +984,8 @@ __iommu_create_mapping(struct device *dev, struct page 
**pages, size_t size,
  
  		len = (j - i) << PAGE_SHIFT;

ret = iommu_map(mapping->domain, iova, phys, len,
-   __dma_info_to_prot(DMA_BIDIRECTIONAL, attrs));
+   __dma_info_to_prot(DMA_BIDIRECTIONAL, attrs),
+   GFP_KERNEL);
if (ret < 0)
goto fail;
iova += len;
@@ -1207,7 +1208,8 @@ static int __map_sg_chunk(struct device *dev, struct 
scatterlist *sg,
  
  		prot = __dma_info_to_prot(dir, attrs);
  
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);

+   ret = iommu_map(mapping->domain, iova, phys, len, prot,
+   GFP_KERNEL);
if (ret < 0)
goto fail;
count += len >> PAGE_SHIFT;
@@ -1379,7 +1381,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, 
struct page *page,
  
  	prot = __dma_info_to_prot(dir, attrs);
  
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);

+   ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len,
+   prot, GFP_KERNEL);
if (ret < 0)
goto fail;
  
@@ -1443,7 +1446,7 @@ static dma_addr_t arm_iommu_map_resource(struct device *dev,
  
  	prot = __dma_info_to_prot(dir, attrs) | IOMMU_MMIO;
  
-	ret = iommu_map(mapping->domain, dma_addr, addr, len, prot);

+   ret = iommu_map(mapping->domain, dma_addr, addr, len, prot, GFP_KERNEL);
if (ret < 0)
goto fail;
  
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c

index 648ecf5a8fbc2a..a4ac94a2ab57fc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -475,7 +475,8 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 
npages, u32 align,
u32 offset = (r->offset + i) << imem->iommu_pgshift;
  
  		ret = iommu_map(imem->domain, offset, node->dma_addrs[i],

-   PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
+   PAGE_SIZE, IOMMU_READ | IOMMU_WRITE,
+   GFP_KERNEL);
if (ret < 0) {
nvkm_error(subdev, "IOMMU mapping failure: %d\n", ret);
  
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c

index 7bd2e65c2a16c5..6ca9f396e55be4 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1057,7 +1057,7 @@ void *tegra_drm_alloc(struct tegra_drm *tegra, size_t 
size, dma_addr_t *dma)
  
  	*dma = iova_dma_addr(>carveout.domain, alloc);

err = 

Re: The arm smmu driver for Linux does not support debugfs

2022-11-15 Thread Robin Murphy

On 2022-11-15 02:28, leo-...@hotmail.com wrote:



Hi,

  Why doesn't the arm smmu driver for Linux support debugfs ?


Because nobody's ever written any debugfs code for it.


Are there any historical reasons?


Only that so far nobody's needed to.

TBH, arm-smmu is actually quite straightforward, and none of the 
internal driver state is really all that interesting (other than the 
special private Adreno stuff, but we leave it up to Rob to implement 
whatever he needs there). Given the kernel config, module parameters, 
and the features logged at probe, you can already infer how it will set 
up context banks etc. for regular IOMMU API work; there won't be any 
surprises. At this point there shouldn't be any need to debug the driver 
itself, it's mature and stable. For debugging *users* of the driver, 
I've only dealt with the DMA layer, where a combination of the IOMMU API 
tracepoints, CONFIG_DMA_API_DEBUG, and my own hacks to iommu-dma have 
always proved sufficient to get enough insight into what's being mapped 
where.


I think a couple of people have previously raised the idea of 
implementing some kind of debugfs dumping for io-pgtable, but nothing's 
ever come of it. As above, it often turns out that you can find the 
information you need from other existing sources, thus the effort of 
implementing and maintaining a load of special-purpose debug code can be 
saved. In particular it would not be worth having driver-specific code 
that only helps debug generic IOMMU API usage - that would be much 
better implemented at the generic IOMMU API level.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] iommu: Regulate errno in ->attach_dev callback functions

2022-09-14 Thread Robin Murphy

On 2022-09-14 18:58, Nicolin Chen wrote:

On Wed, Sep 14, 2022 at 10:49:42AM +0100, Jean-Philippe Brucker wrote:

External email: Use caution opening links or attachments


On Wed, Sep 14, 2022 at 06:11:06AM -0300, Jason Gunthorpe wrote:

On Tue, Sep 13, 2022 at 01:27:03PM +0100, Jean-Philippe Brucker wrote:

I think in the future it will be too easy to forget about the constrained
return value of attach() while modifying some other part of the driver,
and let an external helper return EINVAL. So I'd rather not propagate ret
from outside of viommu_domain_attach() and finalise().


Fortunately, if -EINVAL is wrongly returned it only creates an
inefficiency, not a functional problem. So we do not need to be
precise here.


Ah fair. In that case the attach_dev() documentation should indicate that
EINVAL is a hint, so that callers don't rely on it (currently words "must"
and "exclusively" indicate that returning EINVAL for anything other than
device-domain incompatibility is unacceptable). The virtio-iommu
implementation may well return EINVAL from the virtio stack or from the
host response.


How about this?

+ * * EINVAL- mainly, device and domain are incompatible, or something went
+ *   wrong with the domain. It's suggested to avoid kernel prints
+ *   along with this errno. And it's better to convert any EINVAL
+ *   returned from kAPIs to ENODEV if it is device-specific, or to
+ *   some other reasonable errno being listed below


FWIW, I'd say something like:

"The device and domain are incompatible. If this is due to some previous 
configuration of the domain, drivers should not log an error, since it 
is legitimate for callers to test reuse of an existing domain. 
Otherwise, it may still represent some fundamental problem."


And then at the public interfaces state it from other angle:

"The device and domain are incompatible. If the domain has already been 
used or configured in some way, attaching the same device to a different 
domain may be expected to succeed. Otherwise, it may still represent 
some fundamental problem."


[ and to save another mail, I'm not sure copying the default comment for 
ENOSPC is all that helpful either - what is "space" for something that 
isn't a storage device? I'd guess limited hardware resources in some 
form, but in the IOMMU context, potential confusion with address space 
is maybe a little too close for comfort? ]



Since we can't guarantee that APIs like virtio or ida won't ever return
EINVAL, we should set all return values:


I dislike this alot, it squashes all return codes to try to optimize
an obscure failure path :(


Hmm...should I revert all the driver changes back to this version?


Yeah, I don't think we need to go too mad here. Drivers shouldn't emit 
their *own* -EINVAL unless appropriate, but if it comes back from some 
external API then that implies something's gone unexpectedly wrong 
anyway - maybe it's a transient condition and a subsequent different 
attach might actually work out OK? We can't really say in general. 
Besides, if the driver sees an error which implies it's done something 
wrong itself, it probably shouldn't be trusted to try to reason about it 
further. The caller can handle any error as long as we set their 
expectations correctly.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Robin Murphy

On 2022-09-08 01:43, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:


FWIW, we're now very close to being able to validate dev->iommu against
where the domain came from in core code, and so short-circuit ->attach_dev
entirely if they don't match.


I don't think this is a long term direction. We have systems now with
a number of SMMU blocks and we really are going to see a need that
they share the iommu_domains so we don't have unncessary overheads
from duplicated io page table memory.

So ultimately I'd expect to pass the iommu_domain to the driver and
the driver will decide if the page table memory it represents is
compatible or not. Restricting to only the same iommu instance isn't
good..


Who said IOMMU instance?


Ah, I completely misunderstood what 'dev->iommu' was referring too, OK
I see.


Again, not what I was suggesting. In fact the nature of iommu_attach_group()
already rules out bogus devices getting this far, so all a driver currently
has to worry about is compatibility of a device that it definitely probed
with a domain that it definitely allocated. Therefore, from a caller's point
of view, if attaching to an existing domain returns -EINVAL, try another
domain; multiple different existing domains can be tried, and may also
return -EINVAL for the same or different reasons; the final attempt is to
allocate a fresh domain and attach to that, which should always be nominally
valid and *never* return -EINVAL. If any attempt returns any other error,
bail out down the usual "this should have worked but something went wrong"
path. Even if any driver did have a nonsensical "nothing went wrong, I just
can't attach my device to any of my domains" case, I don't think it would
really need distinguishing from any other general error anyway.


The algorithm you described is exactly what this series does, it just
used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
fundamental problem, just a bit more work.

Looking at Nicolin's series there is a bunch of existing errnos that
would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
ENXIO are all returned as codes for 'domain incompatible with device'
in various drivers. So the patch would still look much the same, just
changing them to EINVAL instead of EMEDIUMTYPE.

That leaves the question of the remaining EINVAL's that Nicolin did
not convert to EMEDIUMTYPE.

eg in the AMD driver:

if (!check_device(dev))
return -EINVAL;

iommu = rlookup_amd_iommu(dev);
if (!iommu)
return -EINVAL;

These are all cases of 'something is really wrong with the device or
iommu, everything will fail'. Other drivers are using ENODEV for this
already, so we'd probably have an additional patch changing various
places like that to ENODEV.

This mixture of error codes is the basic reason why a new code was
used, because none of the existing codes are used with any
consistency.

But OK, I'm on board, lets use more common errnos with specific
meaning, that can be documented in a comment someplace:
  ENOMEM - out of memory
  ENODEV - no domain can attach, device or iommu is messed up
  EINVAL - the domain is incompatible with the device
   - Same behavior as ENODEV, use is discouraged.

I think achieving consistency of error codes is a generally desirable
goal, it makes the error code actually useful.

Joerg this is a good bit of work, will you be OK with it?


Thus as long as we can maintain that basic guarantee that attaching
a group to a newly allocated domain can only ever fail for resource
allocation reasons and not some spurious "incompatibility", then we
don't need any obscure trickery, and a single, clear, error code is
in fact enough to say all that needs to be said.


As above, this is not the case, drivers do seem to have error paths
that are unconditional on the domain. Perhaps they are just protective
assertions and never happen.


Right, that's the gist of what I was getting at - I think it's worth 
putting in the effort to audit and fix the drivers so that that *can* be 
the case, then we can have a meaningful error API with standard codes 
effectively for free, rather than just sighing at the existing mess and 
building a slightly esoteric special case on top.


Case in point, the AMD checks quoted above are pointless, since it 
checks the same things in ->probe_device, and if that fails then the 
device won't get a group so there's no way for it to even reach 
->attach_dev any more. I'm sure there's a *lot* of cruft that can be 
cleared out now that per-device and per-domain ops give us this kind of 
inherent robustness.


Cheers,
Robin.


Regardless, it doesn't matter. If they return ENODEV or EINVAL the
VFIO side algorithm will continue to work fine, it just does alot more
work if EINVAL is permanently returned.

Thanks,
Jason

___
Virtualization mailing l

Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-07 Thread Robin Murphy

On 2022-09-07 18:00, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 03:23:09PM +0100, Robin Murphy wrote:

On 2022-09-07 14:47, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 02:41:54PM +0200, Joerg Roedel wrote:

On Mon, Aug 15, 2022 at 11:14:33AM -0700, Nicolin Chen wrote:

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.


I am not a fan of re-using EMEDIUMTYPE or any other special value. What
is needed here in EINVAL, but with a way to tell the caller which of the
function parameters is actually invalid.


Using errnos to indicate the nature of failure is a well established
unix practice, it is why we have hundreds of error codes and don't
just return -EINVAL for everything.

What don't you like about it?

Would you be happier if we wrote it like

   #define IOMMU_EINCOMPATIBLE_DEVICE xx

Which tells "which of the function parameters is actually invalid" ?


FWIW, we're now very close to being able to validate dev->iommu against
where the domain came from in core code, and so short-circuit ->attach_dev
entirely if they don't match.


I don't think this is a long term direction. We have systems now with
a number of SMMU blocks and we really are going to see a need that
they share the iommu_domains so we don't have unncessary overheads
from duplicated io page table memory.

So ultimately I'd expect to pass the iommu_domain to the driver and
the driver will decide if the page table memory it represents is
compatible or not. Restricting to only the same iommu instance isn't
good..


Who said IOMMU instance? As a reminder, the patch I currently have[1] is 
matching the driver (via the device ops), which happens to be entirely 
compatible with drivers supporting cross-instance domains. Mostly 
because we already have drivers that support cross-instance domains and 
callers that use them.



At that point -EINVAL at the driver callback level could be assumed
to refer to the domain argument, while anything else could be taken
as something going unexpectedly wrong when the attach may otherwise
have worked. I've forgotten if we actually had a valid case anywhere
for "this is my device but even if you retry with a different domain
it's still never going to work", but I think we wouldn't actually
need that anyway - it should be clear enough to a caller that if
attaching to an existing domain fails, then allocating a fresh
domain and attaching also fails, that's the point to give up.


The point was to have clear error handling, we either have permenent
errors or 'this domain will never work with this device error'.

If we treat all error as temporary and just retry randomly it can
create a mess. For instance we might fail to attach to a perfectly
compatible domain due to ENOMEM or something and then go on to
successfully a create a new 2nd domain, just due to races.

We can certainly code the try everything then allocate scheme, it is
just much more fragile than having definitive error codes.


Again, not what I was suggesting. In fact the nature of 
iommu_attach_group() already rules out bogus devices getting this far, 
so all a driver currently has to worry about is compatibility of a 
device that it definitely probed with a domain that it definitely 
allocated. Therefore, from a caller's point of view, if attaching to an 
existing domain returns -EINVAL, try another domain; multiple different 
existing domains can be tried, and may also return -EINVAL for the same 
or different reasons; the final attempt is to allocate a fresh domain 
and attach to that, which should always be nominally valid and *never* 
return -EINVAL. If any attempt returns any other error, bail out down 
the usual "this should have worked but something went wrong" path. Even 
if any driver did have a nonsensical "nothing went wrong, I just can't 
attach my device to any of my domains" case, I don't think it would 
really need distinguishing from any other general error anyway.


Once multiple drivers are in play, the only addition is that the 
"gatekeeper" check inside iommu_attach_group() may also return -EINVAL 
if the device is managed by a different driver, since that still fits 
the same "try again with a different domain" message to the caller.


It's actually quite neat - basically the exact same thing we've tried to 
do with -EMEDIUMTYPE here, but more self-explanatory, since the fact is 
that a domain itself should never be invalid for attaching to via its 
own ops, and a group should never be inherently invalid for attaching to 
a suitable domain, it is only ever a particular combination of group (or 
device at the internal level) and domain that may not be valid together. 
Thus as long as we can maintain that basic guarantee that

Re: [PATCH] iommu/virtio: Fix compile error with viommu_capable()

2022-09-07 Thread Robin Murphy

On 2022-09-07 16:11, Joerg Roedel wrote:

From: Joerg Roedel 

A recent fix introduced viommu_capable() but other changes
from Robin change the function signature of the call-back it
is used for.

When both changes are merged a compile error will happen
because the function pointer types mismatch. Fix that by
updating the viommu_capable() signature after the merge.


I thought I'd called out somewhere that this was going to be a conflict, 
but apparently not, sorry about that.


Acked-by: Robin Murphy 

Lemme spin a patch for the outstanding LKP warning on the bus series 
before that gets annoying too...



Cc: Jean-Philippe Brucker 
Cc: Robin Murphy 
Signed-off-by: Joerg Roedel 
---
  drivers/iommu/virtio-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index da463db9f12a..1b12825e2df1 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1005,7 +1005,7 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
-static bool viommu_capable(enum iommu_cap cap)

+static bool viommu_capable(struct device *dev, enum iommu_cap cap)
  {
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-07 Thread Robin Murphy

On 2022-09-07 14:47, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 02:41:54PM +0200, Joerg Roedel wrote:

On Mon, Aug 15, 2022 at 11:14:33AM -0700, Nicolin Chen wrote:

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.


I am not a fan of re-using EMEDIUMTYPE or any other special value. What
is needed here in EINVAL, but with a way to tell the caller which of the
function parameters is actually invalid.


Using errnos to indicate the nature of failure is a well established
unix practice, it is why we have hundreds of error codes and don't
just return -EINVAL for everything.

What don't you like about it?

Would you be happier if we wrote it like

  #define IOMMU_EINCOMPATIBLE_DEVICE xx

Which tells "which of the function parameters is actually invalid" ?


FWIW, we're now very close to being able to validate dev->iommu against 
where the domain came from in core code, and so short-circuit 
->attach_dev entirely if they don't match. At that point -EINVAL at the 
driver callback level could be assumed to refer to the domain argument, 
while anything else could be taken as something going unexpectedly wrong 
when the attach may otherwise have worked. I've forgotten if we actually 
had a valid case anywhere for "this is my device but even if you retry 
with a different domain it's still never going to work", but I think we 
wouldn't actually need that anyway - it should be clear enough to a 
caller that if attaching to an existing domain fails, then allocating a 
fresh domain and attaching also fails, that's the point to give up.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] iommu/virtio: Fix interaction with VFIO

2022-08-25 Thread Robin Murphy

On 2022-08-25 16:46, Jean-Philippe Brucker wrote:

Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.

The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".


Argh! Massive apologies, I've been totally overlooking that detail and 
forgetting that we ended up splitting out the dedicated 
enforce_cache_coherency op... I do need reminding sometimes :)



While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.

We can distinguish different cases of non-coherent DMA:

(1) When accesses from a hardware endpoint are not coherent. The host
 would describe such a device using firmware methods ('dma-coherent'
 in device-tree, '_CCA' in ACPI), since they are also needed without
 a vIOMMU. In this case mappings are created without IOMMU_CACHE.
 virtio-iommu doesn't need any additional support. It sends the same
 requests as for coherent devices.

(2) When the physical IOMMU supports non-cacheable mappings. Supporting
 those would require a new feature in virtio-iommu, new PROBE request
 property and MAP flags. Device drivers would use a new API to
 discover this since it depends on the architecture and the physical
 IOMMU.

(3) When the hardware supports PCIe no-snoop. It is possible for
 assigned PCIe devices to issue no-snoop transactions, and the
 virtio-iommu specification is lacking any mention of this.

 Arm platforms don't necessarily support no-snoop, and those that do
 cannot enforce coherency of no-snoop transactions. Device drivers
 must be careful about assuming that no-snoop transactions won't end
 up cached; see commit e02f5c1bb228 ("drm: disable uncached DMA
 optimization for ARM and arm64"). On x86 platforms, the host may or
 may not enforce coherency of no-snoop transactions with the physical
 IOMMU. But according to the above commit, on x86 a driver which
 assumes that no-snoop DMA is compatible with uncached CPU mappings
 will also work if the host enforces coherency.

 Although these issues are not specific to virtio-iommu, it could be
 used to facilitate discovery and configuration of no-snoop. This
 would require a new feature bit, PROBE property and ATTACH/MAP
 flags.


Interpreted in the *correct* context, I do think this is objectively 
less wrong than before. We can't guarantee that the underlying 
implementation will respect cacheable mappings, but it is true that we 
can do everything in our power to ask for them.


Reviewed-by: Robin Murphy 


Cc: sta...@vger.kernel.org
Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---
Since v2 [1], I tried to refine the commit message.
This fix is needed for v5.19 and v6.0.

I can improve the check once Robin's change [2] is merged:
capable(IOMMU_CAP_CACHE_COHERENCY) could return dev->dma_coherent for
case (1) above.

[1] 
https://lore.kernel.org/linux-iommu/20220818163801.1011548-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.mur...@arm.com/
---
  drivers/iommu/virtio-iommu.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..80151176ba12 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
+static bool viommu_capable(enum iommu_cap cap)

+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
  static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] iommu/virtio: Fix interaction with VFIO

2022-08-19 Thread Robin Murphy

On 2022-08-19 11:38, Jean-Philippe Brucker wrote:

On Thu, Aug 18, 2022 at 09:10:25PM +0100, Robin Murphy wrote:

On 2022-08-18 17:38, Jean-Philippe Brucker wrote:

Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.

The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.

Non-coherent accesses are not currently a concern for virtio-iommu
because host OSes only assign coherent devices,


Is that guaranteed though? I see nothing in VFIO checking *device*
coherency, only that the *IOMMU* can impose it via this capability, which
would form a very circular argument.


Yes the wording is wrong here, more like "host OSes only assign devices
whose accesses are coherent". And it's not guaranteed, just I'm still
looking for a realistic counter-example. I guess a good indicator would be
a VMM that presents a device without 'dma-coherent'.


vfio-amba with the pl330 on Juno, perhaps?


We can no longer say that in practice
nobody has a VFIO-capable IOMMU in front of non-coherent PCI, now that
Rockchip RK3588 boards are about to start shipping (at best we can only say
that they still won't have the SMMUs in the DT until I've finished ripping
up the bus ops).


Ah, I was hoping that vfio-pci should only be concerned about no-snoop. Do
you know if your series [2] ensures that the SMMU driver doesn't report
IOMMU_CAP_CACHE_COHERENCY for that system?


It should do, since the downstream DT says the SMMU is non-coherent.


and the guest does not
enable PCIe no-snoop. Nevertheless, we can summarize here the possible
support for non-coherent DMA:

(1) When accesses from a hardware endpoint are not coherent. The host
  would describe such a device using firmware methods ('dma-coherent'
  in device-tree, '_CCA' in ACPI), since they are also needed without
  a vIOMMU. In this case mappings are created without IOMMU_CACHE.
  virtio-iommu doesn't need any additional support. It sends the same
  requests as for coherent devices.

(2) When the physical IOMMU supports non-cacheable mappings. Supporting
  those would require a new feature in virtio-iommu, new PROBE request
  property and MAP flags. Device drivers would use a new API to
  discover this since it depends on the architecture and the physical
  IOMMU.

(3) When the hardware supports PCIe no-snoop. Some architecture do not
  support this either (whether no-snoop is supported by an Arm system
  is not discoverable by software). If Linux did enable no-snoop in
  endpoints on x86, then virtio-iommu would need additional feature,
  PROBE property, ATTACH and/or MAP flags to support enforcing snoop.


That's not an "if" - various Linux drivers *do* use no-snoop, which IIUC is
the main reason for VFIO wanting to enforce this in the first place. For
example, see the big fat comment in drm_arch_can_wc_memory() if you've
forgotten the fun we had with AMD GPUs in the TX2 boxes back in the day ;)


Ah duh, I missed that PCI_EXP_DEVCTL_NOSNOOP_EN defaults to 1, of course
it does. So I think VFIO should clear it on Arm and make it read-only,
since the SMMU can't force-snoop like on x86. I'd be tempted to do that if
CONFIG_ARM{,64} is enabled, but checking a new IOMMU capability may be
cleaner.


I think that's a good idea, but IIRC Jason mentioned in review of the 
VFIO series that it's not sufficient to provide the actual guarantee 
we're after, since there are out-of-spec devices that ignore the control 
and may send no-snoop packets anyway. However, as part of a best-effort 
approach for arm64 it still makes sense to help all the well-behaved 
drivers/devices do the right thing.


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] iommu/virtio: Fix interaction with VFIO

2022-08-18 Thread Robin Murphy

On 2022-08-18 17:38, Jean-Philippe Brucker wrote:

Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.

The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.

Non-coherent accesses are not currently a concern for virtio-iommu
because host OSes only assign coherent devices,


Is that guaranteed though? I see nothing in VFIO checking *device* 
coherency, only that the *IOMMU* can impose it via this capability, 
which would form a very circular argument. We can no longer say that in 
practice nobody has a VFIO-capable IOMMU in front of non-coherent PCI, 
now that Rockchip RK3588 boards are about to start shipping (at best we 
can only say that they still won't have the SMMUs in the DT until I've 
finished ripping up the bus ops).



and the guest does not
enable PCIe no-snoop. Nevertheless, we can summarize here the possible
support for non-coherent DMA:

(1) When accesses from a hardware endpoint are not coherent. The host
 would describe such a device using firmware methods ('dma-coherent'
 in device-tree, '_CCA' in ACPI), since they are also needed without
 a vIOMMU. In this case mappings are created without IOMMU_CACHE.
 virtio-iommu doesn't need any additional support. It sends the same
 requests as for coherent devices.

(2) When the physical IOMMU supports non-cacheable mappings. Supporting
 those would require a new feature in virtio-iommu, new PROBE request
 property and MAP flags. Device drivers would use a new API to
 discover this since it depends on the architecture and the physical
 IOMMU.

(3) When the hardware supports PCIe no-snoop. Some architecture do not
 support this either (whether no-snoop is supported by an Arm system
 is not discoverable by software). If Linux did enable no-snoop in
 endpoints on x86, then virtio-iommu would need additional feature,
 PROBE property, ATTACH and/or MAP flags to support enforcing snoop.


That's not an "if" - various Linux drivers *do* use no-snoop, which IIUC 
is the main reason for VFIO wanting to enforce this in the first place. 
For example, see the big fat comment in drm_arch_can_wc_memory() if 
you've forgotten the fun we had with AMD GPUs in the TX2 boxes back in 
the day ;)


This is what I was getting at in reply to v1, it's really not a "this is 
fine as things stand" kind of patch, it's a "this is the best we can do 
to be less wrong for expected usage, but still definitely not right". 
Admittedly I downplayed that a little in [2] by deliberately avoiding 
all mention of no-snoop, but only because that's such a horrific 
unsolvable mess it's hardly worth the pain of bringing up...


Cheers,
Robin.


Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---

Since v1 [1], I added some details to the commit message. This fix is
still needed for v5.19 and v6.0.

I can improve the check once Robin's change [2] is merged:
capable(IOMMU_CAP_CACHE_COHERENCY) could return dev->dma_coherent for
case (1) above.

[1] 
https://lore.kernel.org/linux-iommu/20220714111059.708735-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.mur...@arm.com/
---
  drivers/iommu/virtio-iommu.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..80151176ba12 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
+static bool viommu_capable(enum iommu_cap cap)

+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
  static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Advertise IOMMU_CAP_CACHE_COHERENCY

2022-07-14 Thread Robin Murphy

On 2022-07-14 14:00, Jean-Philippe Brucker wrote:

On Thu, Jul 14, 2022 at 01:01:37PM +0100, Robin Murphy wrote:

On 2022-07-14 12:11, Jean-Philippe Brucker wrote:

Fix virtio-iommu interaction with VFIO, as VFIO now requires
IOMMU_CAP_CACHE_COHERENCY. virtio-iommu does not support non-cacheable
mappings, and always expects to be called with IOMMU_CACHE.


Can we know this is actually true though? What if the virtio-iommu
implementation is backed by something other than VFIO, and the underlying
hardware isn't coherent? AFAICS the spec doesn't disallow that.


Right, I should add a note about that. If someone does actually want to
support non-coherent device, I assume we'll add a per-device property, a
'non-cacheable' mapping flag, and IOMMU_CAP_CACHE_COHERENCY will hold.
I'm also planning to add a check on (IOMMU_CACHE && !IOMMU_NOEXEC) in
viommu_map(), but not as a fix.


But what about all the I/O-coherent PL330s? :P (IIRC you can actually 
make a Juno do that with S2CR.MTCFG hacks...)



In the meantime we do need to restore VFIO support under virtio-iommu,
since userspace still expects that to work, and the existing use-cases are
coherent devices.


Yeah, I'm not necessarily against adding this as a horrible bodge for 
now - the reality is that people using VFIO must be doing it on coherent 
systems or it wouldn't be working properly anyway - as long as we all 
agree that that's what it is.


Next cycle I'll be sending the follow-up patches to bring 
device_iommu_capable() to its final form (hoping the outstanding VDPA 
patch lands in the meantime), at which point we get to sort-of-fix the 
SMMU drivers[1], and can do something similar here too. I guess the main 
question for virtio-iommu is whether it needs to be described/negotiated 
in the protocol itself, or can be reliably described by other standard 
firmware properties (with maybe just a spec not to clarify that 
coherency must be consistent).


Cheers,
Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d8256bf48c8606cbaa6f0815696c2a6dbb72f1b0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Advertise IOMMU_CAP_CACHE_COHERENCY

2022-07-14 Thread Robin Murphy

On 2022-07-14 12:11, Jean-Philippe Brucker wrote:

Fix virtio-iommu interaction with VFIO, as VFIO now requires
IOMMU_CAP_CACHE_COHERENCY. virtio-iommu does not support non-cacheable
mappings, and always expects to be called with IOMMU_CACHE.


Can we know this is actually true though? What if the virtio-iommu 
implementation is backed by something other than VFIO, and the 
underlying hardware isn't coherent? AFAICS the spec doesn't disallow that.


Thanks,
Robin.


Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/virtio-iommu.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 25be4b822aa0..bf340d779c10 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
  }
  
+static bool viommu_capable(enum iommu_cap cap)

+{
+   switch (cap) {
+   case IOMMU_CAP_CACHE_COHERENCY:
+   return true;
+   default:
+   return false;
+   }
+}
+
  static struct iommu_ops viommu_ops = {
+   .capable= viommu_capable,
.domain_alloc   = viommu_domain_alloc,
.probe_device   = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa: Use device_iommu_capable()

2022-07-07 Thread Robin Murphy

On 2022-06-08 12:48, Robin Murphy wrote:

Use the new interface to check the capability for our device
specifically.


Just checking in case this got lost - vdpa is now the only remaining 
iommu_capable() user in linux-next, and I'd like to be able to remove 
the old interface next cycle.


Thanks,
Robin.


Signed-off-by: Robin Murphy 
---
  drivers/vhost/vdpa.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 935a1d0ddb97..4cfebcc24a03 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1074,7 +1074,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
if (!bus)
return -EFAULT;
  
-	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))

+   if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY))
return -ENOTSUPP;
  
  	v->domain = iommu_domain_alloc(bus);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Robin Murphy

On 01/07/2022 5:43 pm, Nicolin Chen wrote:

On Fri, Jul 01, 2022 at 11:21:48AM +0100, Robin Murphy wrote:


diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..072cac5ab5a4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
   struct arm_smmu_device *smmu;
   int ret;

- if (!fwspec || fwspec->ops != _smmu_ops) {
- dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
- return -ENXIO;
- }
+ if (!fwspec || fwspec->ops != _smmu_ops)
+ return -EMEDIUMTYPE;


This is the wrong check, you want the "if (smmu_domain->smmu != smmu)"
condition further down. If this one fails it's effectively because the
device doesn't have an IOMMU at all, and similar to patch #3 it will be


Thanks for the review! I will fix that. The "on the same bus" is
quite eye-catching.


removed once the core code takes over properly (I even have both those
patches written now!)


Actually in my v1 the proposal for ops check returned -EMEDIUMTYPE
also upon an ops mismatch, treating that too as an incompatibility.
Do you mean that we should have fine-grained it further?


On second look, I think this particular check was already entirely 
redundant by the time I made the fwspec conversion to it, oh well. Since 
it remains harmless for the time being, let's just ignore it entirely 
until we can confidently say goodbye to the whole lot[1].


I don't think there's any need to differentiate an instance mismatch 
from a driver mismatch, once the latter becomes realistically possible, 
mostly due to iommu_domain_alloc() also having to become device-aware to 
know which driver to allocate from. Thus as far as a user is concerned, 
if attaching a device to an existing domain fails with -EMEDIUMTYPE, 
allocating a new domain using the given device, and attaching to that, 
can be expected to succeed, regardless of why the original attempt was 
rejected. In fact even in the theoretical different-driver-per-bus model 
the same principle still holds up.


Thanks,
Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/aa4accfa4a10e92daad0d51095918e8a89014393

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Robin Murphy

On 2022-06-30 21:36, Nicolin Chen wrote:

Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.

This is a soft failure that can be resolved by using a different domain.

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.

VFIO can use this to know attach is a soft failure and it should continue
searching. Otherwise the attach will be a hard failure and VFIO will
return the code to userspace.

Update all drivers to return EMEDIUMTYPE in their failure paths that are
related to domain incompatability. Also remove adjacent error prints for
these soft failures, to prevent a kernel log spam, since -EMEDIUMTYPE is
clear enough to indicate an incompatability error.

Add kdocs describing this behavior.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---

[...]

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..072cac5ab5a4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
struct arm_smmu_device *smmu;
int ret;
  
-	if (!fwspec || fwspec->ops != _smmu_ops) {

-   dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
-   return -ENXIO;
-   }
+   if (!fwspec || fwspec->ops != _smmu_ops)
+   return -EMEDIUMTYPE;


This is the wrong check, you want the "if (smmu_domain->smmu != smmu)" 
condition further down. If this one fails it's effectively because the 
device doesn't have an IOMMU at all, and similar to patch #3 it will be 
removed once the core code takes over properly (I even have both those 
patches written now!)


Thanks,
Robin.


/*
 * FIXME: The arch/arm DMA API code tries to attach devices to its own

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-30 Thread Robin Murphy

On 2022-06-29 20:47, Nicolin Chen wrote:

On Fri, Jun 24, 2022 at 03:19:43PM -0300, Jason Gunthorpe wrote:

On Fri, Jun 24, 2022 at 06:35:49PM +0800, Yong Wu wrote:


It's not used in VFIO context. "return 0" just satisfy the iommu
framework to go ahead. and yes, here we only allow the shared
"mapping-domain" (All the devices share a domain created
internally).


What part of the iommu framework is trying to attach a domain and
wants to see success when the domain was not actually attached ?


What prevent this driver from being used in VFIO context?


Nothing prevent this. Just I didn't test.


This is why it is wrong to return success here.


Hi Yong, would you or someone you know be able to confirm whether
this "return 0" is still a must or not?


From memory, it is unfortunately required, due to this driver being in 
the rare position of having to support multiple devices in a single 
address space on 32-bit ARM. Since the old ARM DMA code doesn't 
understand groups, the driver sets up its own canonical 
dma_iommu_mapping to act like a default domain, but then has to politely 
say "yeah OK" to arm_setup_iommu_dma_ops() for each device so that they 
do all end up with the right DMA ops rather than dying in screaming 
failure (the ARM code's per-device mappings then get leaked, but we 
can't really do any better).


The whole mess disappears in the proper default domain conversion, but 
in the meantime, it's still safe to assume that nobody's doing VFIO with 
embedded display/video codec/etc. blocks that don't even have reset drivers.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Robin Murphy

On 2022-05-27 00:50, Dmitry Osipenko wrote:

Hello,

This patchset introduces memory shrinker for the VirtIO-GPU DRM driver
and adds memory purging and eviction support to VirtIO-GPU driver.

The new dma-buf locking convention is introduced here as well.

During OOM, the shrinker will release BOs that are marked as "not needed"
by userspace using the new madvise IOCTL, it will also evict idling BOs
to SWAP. The userspace in this case is the Mesa VirGL driver, it will mark
the cached BOs as "not needed", allowing kernel driver to release memory
of the cached shmem BOs on lowmem situations, preventing OOM kills.

The Panfrost driver is switched to use generic memory shrinker.


I think we still have some outstanding issues here - Alyssa reported 
some weirdness yesterday, so I just tried provoking a low-memory 
condition locally with this series applied and a few debug options 
enabled, and the results as below were... interesting.


Thanks,
Robin.

->8-
[   68.295951] ==
[   68.295956] WARNING: possible circular locking dependency detected
[   68.295963] 5.19.0-rc3+ #400 Not tainted
[   68.295972] --
[   68.295977] cc1/295 is trying to acquire lock:
[   68.295986] 08d7f1a0 
(reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198

[   68.296036]
[   68.296036] but task is already holding lock:
[   68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470

[   68.296080]
[   68.296080] which lock already depends on the new lock.
[   68.296080]
[   68.296085]
[   68.296085] the existing dependency chain (in reverse order) is:
[   68.296090]
[   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
[   68.296111]fs_reclaim_acquire+0xb8/0x150
[   68.296130]dma_resv_lockdep+0x298/0x3fc
[   68.296148]do_one_initcall+0xe4/0x5f8
[   68.296163]kernel_init_freeable+0x414/0x49c
[   68.296180]kernel_init+0x2c/0x148
[   68.296195]ret_from_fork+0x10/0x20
[   68.296207]
[   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   68.296229]__lock_acquire+0x1724/0x2398
[   68.296246]lock_acquire+0x218/0x5b0
[   68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378
[   68.296277]ww_mutex_lock+0x7c/0x4d8
[   68.296291]drm_gem_shmem_free+0x7c/0x198
[   68.296304]panfrost_gem_free_object+0x118/0x138
[   68.296318]drm_gem_object_free+0x40/0x68
[   68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
[   68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
[   68.296368]do_shrink_slab+0x220/0x808
[   68.296381]shrink_slab+0x11c/0x408
[   68.296392]shrink_node+0x6ac/0xb90
[   68.296403]do_try_to_free_pages+0x1dc/0x8d0
[   68.296416]try_to_free_pages+0x1ec/0x5b0
[   68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470
[   68.296444]__alloc_pages+0x4e0/0x5b8
[   68.296455]__folio_alloc+0x24/0x60
[   68.296467]vma_alloc_folio+0xb8/0x2f8
[   68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68
[   68.296498]__handle_mm_fault+0x918/0x12a8
[   68.296513]handle_mm_fault+0x130/0x300
[   68.296527]do_page_fault+0x1d0/0x568
[   68.296539]do_translation_fault+0xa0/0xb8
[   68.296551]do_mem_abort+0x68/0xf8
[   68.296562]el0_da+0x74/0x100
[   68.296572]el0t_64_sync_handler+0x68/0xc0
[   68.296585]el0t_64_sync+0x18c/0x190
[   68.296596]
[   68.296596] other info that might help us debug this:
[   68.296596]
[   68.296601]  Possible unsafe locking scenario:
[   68.296601]
[   68.296604]CPU0CPU1
[   68.296608]
[   68.296612]   lock(fs_reclaim);
[   68.296622] 
lock(reservation_ww_class_mutex);

[   68.296633]lock(fs_reclaim);
[   68.296644]   lock(reservation_ww_class_mutex);
[   68.296654]
[   68.296654]  *** DEADLOCK ***
[   68.296654]
[   68.296658] 3 locks held by cc1/295:
[   68.29]  #0: 0616e898 (>mmap_lock){}-{3:3}, at: 
do_page_fault+0x144/0x568
[   68.296702]  #1: 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470
[   68.296740]  #2: 8c1215b0 (shrinker_rwsem){}-{3:3}, at: 
shrink_slab+0xc0/0x408

[   68.296774]
[   68.296774] stack backtrace:
[   68.296780] CPU: 2 PID: 295 Comm: cc1 Not tainted 5.19.0-rc3+ #400
[   68.296794] Hardware name: ARM LTD ARM Juno Development Platform/ARM 
Juno Development Platform, BIOS EDK II Sep  3 2019

[   68.296803] Call trace:
[   68.296808]  dump_backtrace+0x1e4/0x1f0
[   68.296821]  show_stack+0x20/0x70
[   68.296832]  dump_stack_lvl+0x8c/0xb8
[   68.296849]  dump_stack+0x1c/0x38
[   68.296864]  print_circular_bug.isra.0+0x284/0x378
[   68.296881]  check_noncircular+0x1d8/0x1f8

Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Robin Murphy

On 2022-06-24 14:16, Jason Gunthorpe wrote:

On Wed, Jun 22, 2022 at 08:54:45AM +0100, Robin Murphy wrote:

On 2022-06-16 23:23, Nicolin Chen wrote:

On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:


The domain->ops validation was added, as a precaution, for mixed-driver
systems. However, at this moment only one iommu driver is possible. So
remove it.


It's true on a physical platform. But I'm not sure whether a virtual platform
is allowed to include multiple e.g. one virtio-iommu alongside a virtual VT-d
or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
there is plenty more significant problems than this to solve instead of simply
saying that only one iommu driver is possible if we don't have explicit code
to reject such configuration. 


Will edit this part. Thanks!


Oh, physical platforms with mixed IOMMUs definitely exist already. The main
point is that while bus_set_iommu still exists, the core code effectively
*does* prevent multiple drivers from registering - even in emulated cases
like the example above, virtio-iommu and VT-d would both try to
bus_set_iommu(_bus_type), and one of them will lose. The aspect which
might warrant clarification is that there's no combination of supported
drivers which claim non-overlapping buses *and* could appear in the same
system - even if you tried to contrive something by emulating, say, VT-d
(PCI) alongside rockchip-iommu (platform), you could still only describe one
or the other due to ACPI vs. Devicetree.


Right, and that is still something we need to protect against with
this ops check. VFIO is not checking that the bus's are the same
before attempting to re-use a domain.

So it is actually functional and does protect against systems with
multiple iommu drivers on different busses.


But as above, which systems *are* those? Everything that's on my radar 
would have drivers all competing for the platform bus - Intel and s390 
are somewhat the odd ones out in that respect, but are also non-issues 
as above. FWIW my iommu/bus dev branch has got as far as the final bus 
ops removal and allowing multiple driver registrations, and before it 
allows that, it does now have the common attach check that I sketched 
out in the previous discussion of this.


It's probably also noteworthy that domain->ops is no longer the same 
domain->ops that this code was written to check, and may now be 
different between domains from the same driver.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-22 Thread Robin Murphy

On 2022-06-16 23:23, Nicolin Chen wrote:

On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:


The domain->ops validation was added, as a precaution, for mixed-driver
systems. However, at this moment only one iommu driver is possible. So
remove it.


It's true on a physical platform. But I'm not sure whether a virtual platform
is allowed to include multiple e.g. one virtio-iommu alongside a virtual VT-d
or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
there is plenty more significant problems than this to solve instead of simply
saying that only one iommu driver is possible if we don't have explicit code
to reject such configuration. 


Will edit this part. Thanks!


Oh, physical platforms with mixed IOMMUs definitely exist already. The 
main point is that while bus_set_iommu still exists, the core code 
effectively *does* prevent multiple drivers from registering - even in 
emulated cases like the example above, virtio-iommu and VT-d would both 
try to bus_set_iommu(_bus_type), and one of them will lose. The 
aspect which might warrant clarification is that there's no combination 
of supported drivers which claim non-overlapping buses *and* could 
appear in the same system - even if you tried to contrive something by 
emulating, say, VT-d (PCI) alongside rockchip-iommu (platform), you 
could still only describe one or the other due to ACPI vs. Devicetree.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-20 Thread Robin Murphy

On 2022-06-17 03:53, Tian, Kevin wrote:

From: Nicolin Chen 
Sent: Friday, June 17, 2022 6:41 AM


...

- if (resv_msi) {
+ if (resv_msi && !domain->msi_cookie) {
   ret = iommu_get_msi_cookie(domain->domain,
resv_msi_base);
   if (ret && ret != -ENODEV)
   goto out_detach;
+ domain->msi_cookie = true;
   }


why not moving to alloc_attach_domain() then no need for the new
domain field? It's required only when a new domain is allocated.


When reusing an existing domain that doesn't have an msi_cookie,
we can do iommu_get_msi_cookie() if resv_msi is found. So it is
not limited to a new domain.


Looks msi_cookie requirement is per platform (currently only
for smmu. see arm_smmu_get_resv_regions()). If there is
no mixed case then above check is not required.

But let's hear whether Robin has a different thought here.


Yes, the cookie should logically be tied to the lifetime of the domain 
itself. In the relevant context, "an existing domain that doesn't have 
an msi_cookie" should never exist.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vdpa: Use device_iommu_capable()

2022-06-08 Thread Robin Murphy
Use the new interface to check the capability for our device
specifically.

Signed-off-by: Robin Murphy 
---
 drivers/vhost/vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 935a1d0ddb97..4cfebcc24a03 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1074,7 +1074,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
if (!bus)
return -EFAULT;
 
-   if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
+   if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY))
return -ENOTSUPP;
 
v->domain = iommu_domain_alloc(bus);
-- 
2.36.1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] iommu: Ensure device has the same iommu_ops as the domain

2022-06-06 Thread Robin Murphy

On 2022-06-06 17:51, Nicolin Chen wrote:

Hi Robin,

On Mon, Jun 06, 2022 at 03:33:42PM +0100, Robin Murphy wrote:

On 2022-06-06 07:19, Nicolin Chen wrote:

The core code should not call an iommu driver op with a struct device
parameter unless it knows that the dev_iommu_priv_get() for that struct
device was setup by the same driver. Otherwise in a mixed driver system
the iommu_priv could be casted to the wrong type.


We don't have mixed-driver systems, and there are plenty more
significant problems than this one to solve before we can (but thanks
for pointing it out - I hadn't got as far as auditing the public
interfaces yet). Once domains are allocated via a particular device's
IOMMU instance in the first place, there will be ample opportunity for
the core to stash suitable identifying information in the domain for
itself. TBH even the current code could do it without needing the
weirdly invasive changes here.


Do you have an alternative and less invasive solution in mind?


Store the iommu_ops pointer in the iommu_domain and use it as a check to
validate that the struct device is correct before invoking any domain op
that accepts a struct device.


In fact this even describes exactly that - "Store the iommu_ops pointer
in the iommu_domain", vs. the "Store the iommu_ops pointer in the
iommu_domain_ops" which the patch is actually doing :/


Will fix that.


Well, as before I'd prefer to make the code match the commit message - 
if I really need to spell it out, see below - since I can't imagine that 
we should ever have need to identify a set of iommu_domain_ops in 
isolation, therefore I think it's considerably clearer to use the 
iommu_domain itself. However, either way we really don't need this yet, 
so we may as well just go ahead and remove the redundant test from VFIO 
anyway, and I can add some form of this patch to my dev branch for now.


Thanks,
Robin.

->8-
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cde2e1d6ab9b..72990edc9314 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1902,6 +1902,7 @@ static struct iommu_domain 
*__iommu_domain_alloc(struct device *dev,

domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = ops->pgsize_bitmap;
+   domain->owner = ops;
if (!domain->ops)
domain->ops = ops->default_domain_ops;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6f64cbbc6721..79e557207f53 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -89,6 +89,7 @@ struct iommu_domain_geometry {

 struct iommu_domain {
unsigned type;
+   const struct iommu_ops *owner; /* Who allocated this domain */
const struct iommu_domain_ops *ops;
unsigned long pgsize_bitmap;/* Bitmap of page sizes in use */
iommu_fault_handler_t handler;
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] iommu: Ensure device has the same iommu_ops as the domain

2022-06-06 Thread Robin Murphy

On 2022-06-06 07:19, Nicolin Chen wrote:

The core code should not call an iommu driver op with a struct device
parameter unless it knows that the dev_iommu_priv_get() for that struct
device was setup by the same driver. Otherwise in a mixed driver system
the iommu_priv could be casted to the wrong type.


We don't have mixed-driver systems, and there are plenty more 
significant problems than this one to solve before we can (but thanks 
for pointing it out - I hadn't got as far as auditing the public 
interfaces yet). Once domains are allocated via a particular device's 
IOMMU instance in the first place, there will be ample opportunity for 
the core to stash suitable identifying information in the domain for 
itself. TBH even the current code could do it without needing the 
weirdly invasive changes here.



Store the iommu_ops pointer in the iommu_domain and use it as a check to
validate that the struct device is correct before invoking any domain op
that accepts a struct device.


In fact this even describes exactly that - "Store the iommu_ops pointer 
in the iommu_domain", vs. the "Store the iommu_ops pointer in the 
iommu_domain_ops" which the patch is actually doing :/


[...]

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 19cf28d40ebe..8a1f437a51f2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1963,6 +1963,10 @@ static int __iommu_attach_device(struct iommu_domain 
*domain,
  {
int ret;
  
+	/* Ensure the device was probe'd onto the same driver as the domain */

+   if (dev->bus->iommu_ops != domain->ops->iommu_ops)


Nope, dev_iommu_ops(dev) please. Furthermore I think the logical place 
to put this is in iommu_group_do_attach_device(), since that's the 
gateway for the public interfaces - we shouldn't need to second-guess 
ourselves for internal default-domain-related calls.


Thanks,
Robin.


+   return -EMEDIUMTYPE;
+
if (unlikely(domain->ops->attach_dev == NULL))
return -ENODEV;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-07 Thread Robin Murphy

On 2022-04-07 14:59, Jason Gunthorpe wrote:

On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote:

From: Jason Gunthorpe 
Sent: Thursday, April 7, 2022 1:17 AM

On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:

On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:

Oh, I didn't know about device_get_dma_attr()..


Which is completely broken for any non-OF, non-ACPI plaform.


I saw that, but I spent some time searching and could not find an
iommu driver that would load independently of OF or ACPI. ie no IOMMU
platform drivers are created by board files. Things like Intel/AMD
discover only from ACPI, etc.


Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices.


Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
DEV_DMA_NOT_SUPPORTED?


I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA 
method was actually present to say otherwise), based on 
acpi_init_coherency(), but I only know for sure what happens on arm64.



I think I should give up on this and just redefine the existing iommu
cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.


TBH I don't see any issue with current name, but I'd certainly be happy 
to nail down a specific definition for it, along the lines of "this 
means that IOMMU_CACHE mappings are generally coherent". That works for 
things like Arm's S2FWB making it OK to assign an otherwise-non-coherent 
device without extra hassle.


For the specific case of overriding PCIe No Snoop (which is more 
problematic from an Arm SMMU PoV) when assigning to a VM, would that not 
be easier solved by just having vfio-pci clear the "Enable No Snoop" 
control bit in the endpoint's PCIe capability?



We could alternatively use existing device_get_dma_attr() as a default
with an iommu wrapper and push the exception down through the iommu
driver and s390 can override it.



if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?


I'm looking at the few users:

drivers/ata/ahci_ceva.c
drivers/ata/ahci_qoriq.c
  - These are ARM only drivers. They are trying to copy the dma-coherent
property from its DT/ACPI definition to internal register settings
which look like they tune how the AXI bus transactions are created.

I'm guessing the SATA IP block's AXI interface can be configured to
generate coherent or non-coherent requests and it has to be set
in a way that is consistent with the SOC architecture and match
what the DMA API expects the device will do.

drivers/crypto/ccp/sp-platform.c
  - Only used on ARM64 and also programs a HW register similar to the
sata drivers. Refuses to work if the FW property is not present.

drivers/net/ethernet/amd/xgbe/xgbe-platform.c
  - Seems to be configuring another ARM AXI block

drivers/gpu/drm/panfrost/panfrost_drv.c
  - Robin's commit comment here is good, and one of the things this
controls is if the coherent_walk is set for the io-pgtable-arm.c
code which avoids DMA API calls

drivers/gpu/drm/tegra/uapi.c
  - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.

My take is that the drivers using this API are doing it to make sure
their HW blocks are setup in a way that is consistent with the DMA API
they are also using, and run in constrained embedded-style
environments that know the firmware support is present.

So in the end it does not seem suitable right now for linking to
IOMMU_CACHE..


That seems a pretty good summary - I think they're basically all 
"firmware told Linux I'm coherent so I'd better act coherent" cases, but 
that still doesn't necessarily mean that they're *forced* to respect 
that. One of the things on my to-do list is to try adding a 
DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent 
devices, primarily to hook up in Panfrost (where there is a bit of a 
performance to claw back on the coherent AmLogic SoCs by leaving certain 
buffers non-cacheable).


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Robin Murphy

On 2022-04-06 15:14, Jason Gunthorpe wrote:

On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote:

On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:

Didn't see it

I'll move dev_is_dma_coherent to device.h along with
device_iommu_mapped() and others then


No.  It it is internal for a reason.  It also doesn't actually work
outside of the dma core.  E.g. for non-swiotlb ARM configs it will
not actually work.


Really? It is the only condition that dma_info_to_prot() tests to
decide of IOMMU_CACHE is used or not, so you are saying that there is
a condition where a device can be attached to an iommu_domain and
dev_is_dma_coherent() returns the wrong information? How does
dma-iommu.c safely use it then?


The common iommu-dma layer happens to be part of the subset of the DMA 
core which *does* play the dev->dma_coherent game. 32-bit Arm has its 
own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with 
CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do 
VFIO, but that would be another example if so.



In any case I still need to do something about the places checking
IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE
works. Any idea?


Can we improve the IOMMU drivers such that that *can* be the case 
(within a reasonable margin of error)? That's kind of where I was hoping 
to head with device_iommu_capable(), e.g. [1].


Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()

2022-04-06 Thread Robin Murphy

On 2022-04-05 17:16, Jason Gunthorpe wrote:

vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
way to do this is via dev_is_dma_coherent()


Not necessarily...

Disregarding the complete disaster of PCIe No Snoop on Arm-Based 
systems, there's the more interesting effectively-opposite scenario 
where an SMMU bridges non-coherent devices to a coherent interconnect. 
It's not something we take advantage of yet in Linux, and it can only be 
properly described in ACPI, but there do exist situations where 
IOMMU_CACHE is capable of making the device's traffic snoop, but 
dev_is_dma_coherent() - and device_get_dma_attr() for external users - 
would still say non-coherent because they can't assume that the SMMU is 
enabled and programmed in just the right way.


I've also not thought too much about how things might look with S2FWB 
thrown into the mix in future...


Robin.


like the DMA API does. If
IOMMU_CACHE is not supported then these drivers won't work as they don't
call any coherency-restoring routines around their DMAs.

Signed-off-by: Jason Gunthorpe 
---
  drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++-
  drivers/vhost/vdpa.c |  3 ++-
  2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 760b254ba42d6b..24d118198ac756 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -42,6 +42,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "usnic_log.h"

  #include "usnic_uiom.h"
@@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
struct usnic_uiom_dev *uiom_dev;
int err;
  
+	if (!dev_is_dma_coherent(dev)) {

+   usnic_err("IOMMU of %s does not support cache coherency\n",
+   dev_name(dev));
+   return -EINVAL;
+   }
+
uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC);
if (!uiom_dev)
return -ENOMEM;
@@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
if (err)
goto out_free_dev;
  
-	if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {

-   usnic_err("IOMMU of %s does not support cache coherency\n",
-   dev_name(dev));
-   err = -EINVAL;
-   goto out_detach_device;
-   }
-
spin_lock(>lock);
list_add_tail(_dev->link, >devs);
pd->dev_cnt++;
@@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, 
struct device *dev)
  
  	return 0;
  
-out_detach_device:

-   iommu_detach_device(pd->domain, dev);
  out_free_dev:
kfree(uiom_dev);
return err;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 4c2f0bd062856a..05ea5800febc37 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "vhost.h"
  
@@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)

if (!bus)
return -EFAULT;
  
-	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))

+   if (!dev_is_dma_coherent(dma_dev))
return -ENOTSUPP;
  
  	v->domain = iommu_domain_alloc(bus);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Robin Murphy

On 2022-03-23 10:11, Gerd Hoffmann wrote:

On Wed, Mar 23, 2022 at 09:45:13AM +, Robin Murphy wrote:

On 2022-03-23 07:15, Christian K�nig wrote:

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO space
is a well known bug of those ARM boards.

So as far as I know this is a hardware bug you are trying to workaround
here and I'm not 100% sure that this is correct.


No, this one's not a bug. The Device memory type used for iomem mappings is
*architecturally* defined to enforce properties like aligned accesses, no
speculation, no reordering, etc. If something wants to be treated more like
RAM than actual MMIO registers, then ioremap_wc() or ioremap_cache() is the
appropriate thing to do in general (with the former being a bit more
portable according to Documentation/driver-api/device-io.rst).


Well, qxl is a virtual device, so it *is* ram.

I'm wondering whenever qxl actually works on arm?  As far I know all
virtual display devices with (virtual) pci memory bars for vram do not
work on arm due to the guest mapping vram as io memory and the host
mapping vram as normal ram and the mapping attribute mismatch causes
caching troubles (only noticeable on real arm hardware, not in
emulation).  Did something change here recently?


Indeed, Armv8.4 introduced the S2FWB feature to cope with situations 
like this - essentially it allows the hypervisor to share RAM-backed 
pages with the guest without losing coherency regardless of how the 
guest maps them.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 1/2] drm/qxl: replace ioremap by ioremap_cache on arm64

2022-03-23 Thread Robin Murphy

On 2022-03-23 07:15, Christian König wrote:

Am 22.03.22 um 10:34 schrieb Cong Liu:

qxl use ioremap to map ram_header and rom, in the arm64 implementation,
the device is mapped as DEVICE_nGnRE, it can not support unaligned
access.


Well that some ARM boards doesn't allow unaligned access to MMIO space 
is a well known bug of those ARM boards.


So as far as I know this is a hardware bug you are trying to workaround 
here and I'm not 100% sure that this is correct.


No, this one's not a bug. The Device memory type used for iomem mappings 
is *architecturally* defined to enforce properties like aligned 
accesses, no speculation, no reordering, etc. If something wants to be 
treated more like RAM than actual MMIO registers, then ioremap_wc() or 
ioremap_cache() is the appropriate thing to do in general (with the 
former being a bit more portable according to 
Documentation/driver-api/device-io.rst).


Of course *then* you might find that on some systems the 
interconnect/PCIe implementation/endpoint doesn't actually like 
unaligned accesses once the CPU MMU starts allowing them to be sent out, 
but hey, one step at a time ;)


Robin.



Christian.



   6.620515] pc : setup_hw_slot+0x24/0x60 [qxl]
[    6.620961] lr : setup_slot+0x34/0xf0 [qxl]
[    6.621376] sp : 800012b73760
[    6.621701] x29: 800012b73760 x28: 0001 x27: 
1000
[    6.622400] x26: 0001 x25: 0400 x24: 
cf376848c000
[    6.623099] x23: c4087400 x22: cf3718e17140 x21: 

[    6.623823] x20: c4086000 x19: c40870b0 x18: 
0014
[    6.624519] x17: 4d3605ab x16: bb3b6129 x15: 
6e771809
[    6.625214] x14: 0001 x13: 007473696c5f7974 x12: 
696e69615f65
[    6.625909] x11: d543656a x10:  x9 : 
cf3718e085a4
[    6.626616] x8 : 006c7871 x7 : 000a x6 : 
0017
[    6.627343] x5 : 1400 x4 : 800011f63400 x3 : 
1400
[    6.628047] x2 :  x1 : c40870b0 x0 : 
c4086000

[    6.628751] Call trace:
[    6.628994]  setup_hw_slot+0x24/0x60 [qxl]
[    6.629404]  setup_slot+0x34/0xf0 [qxl]
[    6.629790]  qxl_device_init+0x6f0/0x7f0 [qxl]
[    6.630235]  qxl_pci_probe+0xdc/0x1d0 [qxl]
[    6.630654]  local_pci_probe+0x48/0xb8
[    6.631027]  pci_device_probe+0x194/0x208
[    6.631464]  really_probe+0xd0/0x458
[    6.631818]  __driver_probe_device+0x124/0x1c0
[    6.632256]  driver_probe_device+0x48/0x130
[    6.632669]  __driver_attach+0xc4/0x1a8
[    6.633049]  bus_for_each_dev+0x78/0xd0
[    6.633437]  driver_attach+0x2c/0x38
[    6.633789]  bus_add_driver+0x154/0x248
[    6.634168]  driver_register+0x6c/0x128
[    6.635205]  __pci_register_driver+0x4c/0x58
[    6.635628]  qxl_init+0x48/0x1000 [qxl]
[    6.636013]  do_one_initcall+0x50/0x240
[    6.636390]  do_init_module+0x60/0x238
[    6.636768]  load_module+0x2458/0x2900
[    6.637136]  __do_sys_finit_module+0xbc/0x128
[    6.637561]  __arm64_sys_finit_module+0x28/0x38
[    6.638004]  invoke_syscall+0x74/0xf0
[    6.638366]  el0_svc_common.constprop.0+0x58/0x1a8
[    6.638836]  do_el0_svc+0x2c/0x90
[    6.639216]  el0_svc+0x40/0x190
[    6.639526]  el0t_64_sync_handler+0xb0/0xb8
[    6.639934]  el0t_64_sync+0x1a4/0x1a8
[    6.640294] Code: 910003fd f9484804 f9400c23 8b050084 (f809c083)
[    6.640889] ---[ end trace 95615d89b7c87f95 ]---

Signed-off-by: Cong Liu 
---
  drivers/gpu/drm/qxl/qxl_kms.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_kms.c 
b/drivers/gpu/drm/qxl/qxl_kms.c

index 4dc5ad13f12c..0e61ac04d8ad 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -165,7 +165,11 @@ int qxl_device_init(struct qxl_device *qdev,
   (int)qdev->surfaceram_size / 1024,
   (sb == 4) ? "64bit" : "32bit");
+#ifdef CONFIG_ARM64
+    qdev->rom = ioremap_cache(qdev->rom_base, qdev->rom_size);
+#else
  qdev->rom = ioremap(qdev->rom_base, qdev->rom_size);
+#endif
  if (!qdev->rom) {
  pr_err("Unable to ioremap ROM\n");
  r = -ENOMEM;
@@ -183,9 +187,15 @@ int qxl_device_init(struct qxl_device *qdev,
  goto rom_unmap;
  }
+#ifdef CONFIG_ARM64
+    qdev->ram_header = ioremap_cache(qdev->vram_base +
+   qdev->rom->ram_header_offset,
+   sizeof(*qdev->ram_header));
+#else
  qdev->ram_header = ioremap(qdev->vram_base +
 qdev->rom->ram_header_offset,
 sizeof(*qdev->ram_header));
+#endif
  if (!qdev->ram_header) {
  DRM_ERROR("Unable to ioremap RAM header\n");
  r = -ENOMEM;



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 4/8] drm/virtio: Improve DMA API usage for shmem BOs

2022-03-16 Thread Robin Murphy

On 2022-03-14 22:42, Dmitry Osipenko wrote:

DRM API requires the DRM's driver to be backed with the device that can
be used for generic DMA operations. The VirtIO-GPU device can't perform
DMA operations if it uses PCI transport because PCI device driver creates
a virtual VirtIO-GPU device that isn't associated with the PCI. Use PCI's
GPU device for the DRM's device instead of the VirtIO-GPU device and drop
DMA-related hacks from the VirtIO-GPU driver.

Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/virtio/virtgpu_drv.c| 22 +++---
  drivers/gpu/drm/virtio/virtgpu_drv.h|  5 +--
  drivers/gpu/drm/virtio/virtgpu_kms.c|  7 ++--
  drivers/gpu/drm/virtio/virtgpu_object.c | 56 +
  drivers/gpu/drm/virtio/virtgpu_vq.c | 13 +++---
  5 files changed, 37 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 5f25a8d15464..8449dad3e65c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -46,9 +46,9 @@ static int virtio_gpu_modeset = -1;
  MODULE_PARM_DESC(modeset, "Disable/Enable modesetting");
  module_param_named(modeset, virtio_gpu_modeset, int, 0400);
  
-static int virtio_gpu_pci_quirk(struct drm_device *dev, struct virtio_device *vdev)

+static int virtio_gpu_pci_quirk(struct drm_device *dev)
  {
-   struct pci_dev *pdev = to_pci_dev(vdev->dev.parent);
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
const char *pname = dev_name(>dev);
bool vga = (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
char unique[20];
@@ -101,6 +101,7 @@ static int virtio_gpu_pci_quirk(struct drm_device *dev, 
struct virtio_device *vd
  static int virtio_gpu_probe(struct virtio_device *vdev)
  {
struct drm_device *dev;
+   struct device *dma_dev;
int ret;
  
  	if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1)

@@ -109,18 +110,29 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
if (virtio_gpu_modeset == 0)
return -EINVAL;
  
-	dev = drm_dev_alloc(, >dev);

+   /*
+* If GPU's parent is a PCI device, then we will use this PCI device
+* for the DRM's driver device because GPU won't have PCI's IOMMU DMA
+* ops in this case since GPU device is sitting on a separate (from PCI)
+* virtio-bus.
+*/
+   if (!strcmp(vdev->dev.parent->bus->name, "pci"))


Nit: dev_is_pci() ?

However, what about other VirtIO transports? Wouldn't virtio-mmio with 
F_ACCESS_PLATFORM be in a similar situation?


Robin.


+   dma_dev = vdev->dev.parent;
+   else
+   dma_dev = >dev;
+
+   dev = drm_dev_alloc(, dma_dev);
if (IS_ERR(dev))
return PTR_ERR(dev);
vdev->priv = dev;
  
  	if (!strcmp(vdev->dev.parent->bus->name, "pci")) {

-   ret = virtio_gpu_pci_quirk(dev, vdev);
+   ret = virtio_gpu_pci_quirk(dev);
if (ret)
goto err_free;
}
  
-	ret = virtio_gpu_init(dev);

+   ret = virtio_gpu_init(vdev, dev);
if (ret)
goto err_free;
  
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h

index 0a194aaad419..b2d93cb12ebf 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -100,8 +100,6 @@ struct virtio_gpu_object {
  
  struct virtio_gpu_object_shmem {

struct virtio_gpu_object base;
-   struct sg_table *pages;
-   uint32_t mapped;
  };
  
  struct virtio_gpu_object_vram {

@@ -214,7 +212,6 @@ struct virtio_gpu_drv_cap_cache {
  };
  
  struct virtio_gpu_device {

-   struct device *dev;
struct drm_device *ddev;
  
  	struct virtio_device *vdev;

@@ -282,7 +279,7 @@ extern struct drm_ioctl_desc 
virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
  void virtio_gpu_create_context(struct drm_device *dev, struct drm_file *file);
  
  /* virtgpu_kms.c */

-int virtio_gpu_init(struct drm_device *dev);
+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev);
  void virtio_gpu_deinit(struct drm_device *dev);
  void virtio_gpu_release(struct drm_device *dev);
  int virtio_gpu_driver_open(struct drm_device *dev, struct drm_file *file);
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c 
b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 3313b92db531..0d1e3eb61bee 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -110,7 +110,7 @@ static void virtio_gpu_get_capsets(struct virtio_gpu_device 
*vgdev,
vgdev->num_capsets = num_capsets;
  }
  
-int virtio_gpu_init(struct drm_device *dev)

+int virtio_gpu_init(struct virtio_device *vdev, struct drm_device *dev)
  {
static vq_callback_t *callbacks[] = {
virtio_gpu_ctrl_ack, virtio_gpu_cursor_ack
@@ -123,7 +123,7 @@ int virtio_gpu_init(struct drm_device *dev)
u32 num_scanouts, 

Re: [PATCH v2] iommu/iova: Separate out rcache init

2022-02-14 Thread Robin Murphy

On 2022-02-03 09:59, John Garry wrote:

Currently the rcache structures are allocated for all IOVA domains, even if
they do not use "fast" alloc+free interface. This is wasteful of memory.

In addition, fails in init_iova_rcaches() are not handled safely, which is
less than ideal.

Make "fast" users call a separate rcache init explicitly, which includes
error checking.


Reviewed-by: Robin Murphy 


Signed-off-by: John Garry 
---
Differences to v1:
- Drop stubs for iova_domain_init_rcaches() and iova_domain_free_rcaches()
- Use put_iova_domain() in vdpa code

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d85d54f2b549..b22034975301 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   int ret;
  
  	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

return -EINVAL;
@@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
  
  	init_iova_domain(iovad, 1UL << order, base_pfn);

+   ret = iova_domain_init_rcaches(iovad);
+   if (ret)
+   return ret;
  
  	/* If the FQ fails we can simply fall back to strict mode */

if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b28c9435b898..7e9c3a97c040 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,13 +15,14 @@
  /* The anchor node sits above the top of the usable address space */
  #define IOVA_ANCHOR   ~0UL
  
+#define IOVA_RANGE_CACHE_MAX_SIZE 6	/* log of max cached IOVA range size (in pages) */

+
  static bool iova_rcache_insert(struct iova_domain *iovad,
   unsigned long pfn,
   unsigned long size);
  static unsigned long iova_rcache_get(struct iova_domain *iovad,
 unsigned long size,
 unsigned long limit_pfn);
-static void init_iova_rcaches(struct iova_domain *iovad);
  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
*iovad);
  static void free_iova_rcaches(struct iova_domain *iovad);
  
@@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,

iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
rb_insert_color(>anchor.node, >rbroot);
-   cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
>cpuhp_dead);
-   init_iova_rcaches(iovad);
  }
  EXPORT_SYMBOL_GPL(init_iova_domain);
  
@@ -488,6 +487,13 @@ free_iova_fast(struct iova_domain *iovad, unsigned long pfn, unsigned long size)

  }
  EXPORT_SYMBOL_GPL(free_iova_fast);
  
+static void iova_domain_free_rcaches(struct iova_domain *iovad)

+{
+   cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,
+   >cpuhp_dead);
+   free_iova_rcaches(iovad);
+}
+
  /**
   * put_iova_domain - destroys the iova domain
   * @iovad: - iova domain in question.
@@ -497,9 +503,9 @@ void put_iova_domain(struct iova_domain *iovad)
  {
struct iova *iova, *tmp;
  
-	cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,

-   >cpuhp_dead);
-   free_iova_rcaches(iovad);
+   if (iovad->rcaches)
+   iova_domain_free_rcaches(iovad);
+
rbtree_postorder_for_each_entry_safe(iova, tmp, >rbroot, node)
free_iova_mem(iova);
  }
@@ -608,6 +614,7 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   */
  
  #define IOVA_MAG_SIZE 128

+#define MAX_GLOBAL_MAGS 32 /* magazines per bin */
  
  struct iova_magazine {

unsigned long size;
@@ -620,6 +627,13 @@ struct iova_cpu_rcache {
struct iova_magazine *prev;
  };
  
+struct iova_rcache {

+   spinlock_t lock;
+   unsigned long depot_size;
+   struct iova_magazine *depot[MAX_GLOBAL_MAGS];
+   struct iova_cpu_rcache __percpu *cpu_rcaches;
+};
+
  static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
  {
return kzalloc(sizeof(struct iova_magazine), flags);
@@ -693,28 +707,54 @@ static void iova_magazine_push(struct iova_magazine *mag, 
unsigned long pfn)
mag->pfns[mag->size++] = pfn;
  }
  
-static void init_iova_rcaches(struct iova_domain *iovad)

+int iova_domain_init_rcaches(struct iova_domain *iovad)
  {
-   struct iova_cpu_rcache *cpu_rcache;
-   struct iova_rcache *rcache;
unsigned int cpu;
-   int i;
+   int i, ret;
+
+   iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE,
+sizeof(struct iova_rcache),
+   

Re: [PATCH] iommu/iova: Separate out rcache init

2022-01-28 Thread Robin Murphy

On 2022-01-28 11:32, John Garry wrote:

On 26/01/2022 17:00, Robin Murphy wrote:

As above, I vote for just forward-declaring the free routine in iova.c
and keeping it entirely private.


BTW, speaking of forward declarations, it's possible to remove all the 
forward declarations in iova.c now that the FQ code is gone - but with a 
good bit of rearranging. However I am not sure how much people care 
about that or whether the code layout is sane...


Indeed, I was very tempted to raise the question there of whether there 
was any more cleanup or refactoring that could be done to justify 
collecting all the rcache code together at the top of iova.c. But in the 
end I didn't, so my opinion still remains a secret...


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/iova: Separate out rcache init

2022-01-26 Thread Robin Murphy

On 2022-01-26 13:55, John Garry wrote:

Currently the rcache structures are allocated for all IOVA domains, even if
they do not use "fast" alloc+free interface. This is wasteful of memory.

In addition, fails in init_iova_rcaches() are not handled safely, which is
less than ideal.

Make "fast" users call a separate rcache init explicitly, which includes
error checking.

Signed-off-by: John Garry 


Mangled patch? (no "---" separator here)

Overall this looks great, just a few comments further down...


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3a46f2cc9e5d..dd066d990809 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,6 +525,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   int ret;
  
  	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

return -EINVAL;
@@ -559,6 +560,9 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
  
  	init_iova_domain(iovad, 1UL << order, base_pfn);

+   ret = iova_domain_init_rcaches(iovad);
+   if (ret)
+   return ret;
  
  	/* If the FQ fails we can simply fall back to strict mode */

if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b28c9435b898..d3adc6ea5710 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,13 +15,14 @@
  /* The anchor node sits above the top of the usable address space */
  #define IOVA_ANCHOR   ~0UL
  
+#define IOVA_RANGE_CACHE_MAX_SIZE 6	/* log of max cached IOVA range size (in pages) */

+
  static bool iova_rcache_insert(struct iova_domain *iovad,
   unsigned long pfn,
   unsigned long size);
  static unsigned long iova_rcache_get(struct iova_domain *iovad,
 unsigned long size,
 unsigned long limit_pfn);
-static void init_iova_rcaches(struct iova_domain *iovad);
  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
*iovad);
  static void free_iova_rcaches(struct iova_domain *iovad);
  
@@ -64,8 +65,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,

iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
rb_insert_color(>anchor.node, >rbroot);
-   cpuhp_state_add_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD, 
>cpuhp_dead);
-   init_iova_rcaches(iovad);
  }
  EXPORT_SYMBOL_GPL(init_iova_domain);
  
@@ -497,9 +496,9 @@ void put_iova_domain(struct iova_domain *iovad)

  {
struct iova *iova, *tmp;
  
-	cpuhp_state_remove_instance_nocalls(CPUHP_IOMMU_IOVA_DEAD,

-   >cpuhp_dead);
-   free_iova_rcaches(iovad);
+   if (iovad->rcaches)
+   iova_domain_free_rcaches(iovad);
+
rbtree_postorder_for_each_entry_safe(iova, tmp, >rbroot, node)
free_iova_mem(iova);
  }
@@ -608,6 +607,7 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   */
  
  #define IOVA_MAG_SIZE 128

+#define MAX_GLOBAL_MAGS 32 /* magazines per bin */
  
  struct iova_magazine {

unsigned long size;
@@ -620,6 +620,13 @@ struct iova_cpu_rcache {
struct iova_magazine *prev;
  };
  
+struct iova_rcache {

+   spinlock_t lock;
+   unsigned long depot_size;
+   struct iova_magazine *depot[MAX_GLOBAL_MAGS];
+   struct iova_cpu_rcache __percpu *cpu_rcaches;
+};
+
  static struct iova_magazine *iova_magazine_alloc(gfp_t flags)
  {
return kzalloc(sizeof(struct iova_magazine), flags);
@@ -693,28 +700,62 @@ static void iova_magazine_push(struct iova_magazine *mag, 
unsigned long pfn)
mag->pfns[mag->size++] = pfn;
  }
  
-static void init_iova_rcaches(struct iova_domain *iovad)

+int iova_domain_init_rcaches(struct iova_domain *iovad)
  {
-   struct iova_cpu_rcache *cpu_rcache;
-   struct iova_rcache *rcache;
unsigned int cpu;
-   int i;
+   int i, ret;
+
+   iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE,
+sizeof(struct iova_rcache),
+GFP_KERNEL);
+   if (!iovad->rcaches)
+   return -ENOMEM;
  
  	for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {

+   struct iova_cpu_rcache *cpu_rcache;
+   struct iova_rcache *rcache;
+
rcache = >rcaches[i];
spin_lock_init(>lock);
rcache->depot_size = 0;
-   rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), 
cache_line_size());
-   if (WARN_ON(!rcache->cpu_rcaches))
-   continue;
+   rcache->cpu_rcaches = 

Re: [PATCH 4/5] iommu: Separate IOVA rcache memories from iova_domain structure

2021-12-20 Thread Robin Murphy

Hi John,

On 2021-12-20 08:49, John Garry wrote:

On 24/09/2021 11:01, John Garry wrote:

Only dma-iommu.c and vdpa actually use the "fast" mode of IOVA alloc and
free. As such, it's wasteful that all other IOVA domains hold the rcache
memories.

In addition, the current IOVA domain init implementation is poor
(init_iova_domain()), in that errors are ignored and not passed to the
caller. The only errors can come from the IOVA rcache init, and fixing up
all the IOVA domain init callsites to handle the errors would take some
work.

Separate the IOVA rache out of the IOVA domain, and create a new IOVA
domain structure, iova_caching_domain.

Signed-off-by: John Garry 


Hi Robin,

Do you have any thoughts on this patch? The decision is whether we stick 
with a single iova domain structure or support this super structure for 
iova domains which support the rcache. I did not try the former - it 
would be do-able but I am not sure on how it would look.


TBH I feel inclined to take the simpler approach of just splitting the 
rcache array to a separate allocation, making init_iova_rcaches() public 
(with a proper return value), and tweaking put_iova_domain() to make 
rcache cleanup conditional. A residual overhead of 3 extra pointers in 
iova_domain doesn't seem like *too* much for non-DMA-API users to bear. 
Unless you want to try generalising the rcache mechanism completely away 
from IOVA API specifics, it doesn't seem like there's really enough to 
justify the bother of having its own distinct abstraction layer.


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] iova: Move fast alloc size roundup into alloc_iova_fast()

2021-12-07 Thread Robin Murphy

On 2021-12-07 11:17, John Garry wrote:

It really is a property of the IOVA rcache code that we need to alloc a
power-of-2 size, so relocate the functionality to resize into
alloc_iova_fast(), rather than the callsites.


I'd still much prefer to resolve the issue that there shouldn't *be* 
more than one caller in the first place, but hey.


Acked-by: Robin Murphy 


Signed-off-by: John Garry 
Acked-by: Will Deacon 
Reviewed-by: Xie Yongji 
Acked-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
---
Differences to v1:
- Separate out from original series which conflicts with Robin's IOVA FQ work:
   
https://lore.kernel.org/linux-iommu/1632477717-5254-1-git-send-email-john.ga...@huawei.com/
- Add tags - thanks!

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b42e38a0dbe2..84dee53fe892 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -442,14 +442,6 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
  
  	shift = iova_shift(iovad);

iova_len = size >> shift;
-   /*
-* Freeing non-power-of-two-sized allocations back into the IOVA caches
-* will come back to bite us badly, so we have to waste a bit of space
-* rounding up anything cacheable to make sure that can't happen. The
-* order of the unadjusted size will still match upon freeing.
-*/
-   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
-   iova_len = roundup_pow_of_two(iova_len);
  
  	dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
  
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c

index 9e8bc802ac05..ff567cbc42f7 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -497,6 +497,15 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
unsigned long iova_pfn;
struct iova *new_iova;
  
+	/*

+* Freeing non-power-of-two-sized allocations back into the IOVA caches
+* will come back to bite us badly, so we have to waste a bit of space
+* rounding up anything cacheable to make sure that can't happen. The
+* order of the unadjusted size will still match upon freeing.
+*/
+   if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+   size = roundup_pow_of_two(size);
+
iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);
if (iova_pfn)
return iova_pfn;
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
index 1daae2608860..2b1143f11d8f 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -292,14 +292,6 @@ vduse_domain_alloc_iova(struct iova_domain *iovad,
unsigned long iova_len = iova_align(iovad, size) >> shift;
unsigned long iova_pfn;
  
-	/*

-* Freeing non-power-of-two-sized allocations back into the IOVA caches
-* will come back to bite us badly, so we have to waste a bit of space
-* rounding up anything cacheable to make sure that can't happen. The
-* order of the unadjusted size will still match upon freeing.
-*/
-   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
-   iova_len = roundup_pow_of_two(iova_len);
iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true);
  
  	return iova_pfn << shift;



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

2021-11-16 Thread Robin Murphy

On 2021-11-16 14:21, John Garry wrote:

On 04/10/2021 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin 
here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/ 


It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.


Hi Robin,

Just wondering if you had made any progress on your FQ code rework or 
your own re-org?


Hey John - as it happens I started hacking on that in earnest about half 
an hour ago, aiming to get something out later this week.


Cheers,
Robin.

I wasn't planning on progressing 
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/ 
until this is done first (and that is still a big issue), even though 
not strictly necessary.


Thanks,
John

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] iommu: Some IOVA code reorganisation

2021-10-04 Thread Robin Murphy

On 2021-10-04 12:44, Will Deacon wrote:

On Fri, Sep 24, 2021 at 06:01:52PM +0800, John Garry wrote:

The IOVA domain structure is a bit overloaded, holding:
- IOVA tree management
- FQ control
- IOVA rcache memories

Indeed only a couple of IOVA users use the rcache, and only dma-iommu.c
uses the FQ feature.

This series separates out that structure. In addition, it moves the FQ
code into dma-iommu.c . This is not strictly necessary, but it does make
it easier for the FQ domain lookup the rcache domain.

The rcache code stays where it is, as it may be reworked in future, so
there is not much point in relocating and then discarding.

This topic was initially discussed and suggested (I think) by Robin here:
https://lore.kernel.org/linux-iommu/1d06eda1-9961-d023-f5e7-fe87e768f...@arm.com/


It would be useful to have Robin's Ack on patches 2-4. The implementation
looks straightforward to me, but the thread above isn't very clear about
what is being suggested.


FWIW I actually got about half-way through writing my own equivalent of 
patches 2-3, except tackling it from the other direction - simplifying 
the FQ code *before* moving whatever was left to iommu-dma, then I got 
side-tracked trying to make io-pgtable use that freelist properly, and 
then I've been on holiday the last 2 weeks. I've got other things to 
catch up on first but I'll try to get to this later this week.



To play devil's advocate: there aren't many direct users of the iovad code:
either they'll die out entirely (and everybody will use the dma-iommu code)
and it's fine having the flush queue code where it is, or we'll get more
users and the likelihood of somebody else wanting flush queues increases.


I think the FQ code is mostly just here as a historical artefact, since 
the IOVA allocator was the only thing common to the Intel and AMD DMA 
ops when the common FQ implementation was factored out of those, so 
although it's essentially orthogonal it was still related enough that it 
was an easy place to stick it.


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-04 Thread Robin Murphy

On 2021-08-04 06:02, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 6:54 PM Robin Murphy  wrote:


On 2021-08-03 09:54, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 3:41 PM Jason Wang  wrote:



在 2021/7/29 下午3:34, Xie Yongji 写道:

Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.



It's better to explain why alloc_iova() is not sufficient here.



Fine.


What I fail to understand from the later patches is what the IOVA domain
actually represents. If the "device" is a userspace process then
logically the "IOVA" would be the userspace address, so presumably
somewhere you're having to translate between this arbitrary address
space and actual usable addresses - if you're worried about efficiency
surely it would be even better to not do that?



Yes, userspace daemon needs to translate the "IOVA" in a DMA
descriptor to the VA (from mmap(2)). But this actually doesn't affect
performance since it's an identical mapping in most cases.


I'm not familiar with the vhost_iotlb stuff, but it looks suspiciously 
like you're walking yet another tree to make those translations. Even if 
the buffer can be mapped all at once with a fixed offset such that each 
DMA mapping call doesn't need a lookup for each individual "IOVA" - that 
might be what's happening already, but it's a bit hard to follow just 
reading the patches in my mail client - vhost_iotlb_add_range() doesn't 
look like it's super-cheap to call, and you're serialising on a lock for 
that.


My main point, though, is that if you've already got something else 
keeping track of the actual addresses, then the way you're using an 
iova_domain appears to be something you could do with a trivial bitmap 
allocator. That's why I don't buy the efficiency argument. The main 
design points of the IOVA allocator are to manage large address spaces 
while trying to maximise spatial locality to minimise the underlying 
pagetable usage, and allocating with a flexible limit to support 
multiple devices with different addressing capabilities in the same 
address space. If none of those aspects are relevant to the use-case - 
which AFAICS appears to be true here - then as a general-purpose 
resource allocator it's rubbish and has an unreasonably massive memory 
overhead and there are many, many better choices.


FWIW I've recently started thinking about moving all the caching stuff 
out of iova_domain and into the iommu-dma layer since it's now a giant 
waste of space for all the other current IOVA users.



Presumably userspace doesn't have any concern about alignment and the
things we have to worry about for the DMA API in general, so it's pretty
much just allocating slots in a buffer, and there are far more effective
ways to do that than a full-blown address space manager.


Considering iova allocation efficiency, I think the iova allocator is
better here. In most cases, we don't even need to hold a spin lock
during iova allocation.


If you're going
to reuse any infrastructure I'd have expected it to be SWIOTLB rather
than the IOVA allocator. Because, y'know, you're *literally implementing
a software I/O TLB* ;)



But actually what we can reuse in SWIOTLB is the IOVA allocator.


Huh? Those are completely unrelated and orthogonal things - SWIOTLB does 
not use an external allocator (see find_slots()). By SWIOTLB I mean 
specifically the library itself, not dma-direct or any of the other 
users built around it. The functionality for managing slots in a buffer 
and bouncing data in and out can absolutely be reused - that's why users 
like the Xen and iommu-dma code *are* reusing it instead of open-coding 
their own versions.



And
the IOVA management in SWIOTLB is not what we want. For example,
SWIOTLB allocates and uses contiguous memory for bouncing, which is
not necessary in VDUSE case.


alloc_iova() allocates a contiguous (in IOVA address) region of space. 
In vduse_domain_map_page() you use it to allocate a contiguous region of 
space from your bounce buffer. Can you clarify how that is fundamentally 
different from allocating a contiguous region of space from a bounce 
buffer? Nobody's saying the underlying implementation details of where 
the buffer itself comes from can't be tweaked.



And VDUSE needs coherent mapping which is
not supported by the SWIOTLB. Besides, the SWIOTLB works in singleton
mode (designed for platform IOMMU) , but VDUSE is based on on-chip
IOMMU (supports multiple instances).
That's not entirely true - the IOMMU bounce buffering scheme introduced 
in intel-iommu and now moved into the iommu-dma layer was already a step 
towards something conceptually similar. It does still rely on stealing 
the underlying pages from the global SWIOTLB pool at the moment, but the 
bouncing is effectively done in a per-IOMMU-domain context.


The next step is currently queued in linux-next, wherein we can now have 
individual per-device SWIOTLB pools. In fact a

Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-03 Thread Robin Murphy

On 2021-08-03 09:54, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 3:41 PM Jason Wang  wrote:



在 2021/7/29 下午3:34, Xie Yongji 写道:

Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.



It's better to explain why alloc_iova() is not sufficient here.



Fine.


What I fail to understand from the later patches is what the IOVA domain 
actually represents. If the "device" is a userspace process then 
logically the "IOVA" would be the userspace address, so presumably 
somewhere you're having to translate between this arbitrary address 
space and actual usable addresses - if you're worried about efficiency 
surely it would be even better to not do that?


Presumably userspace doesn't have any concern about alignment and the 
things we have to worry about for the DMA API in general, so it's pretty 
much just allocating slots in a buffer, and there are far more effective 
ways to do that than a full-blown address space manager. If you're going 
to reuse any infrastructure I'd have expected it to be SWIOTLB rather 
than the IOVA allocator. Because, y'know, you're *literally implementing 
a software I/O TLB* ;)


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v1 3/8] intel/vt-d: make DMAR table parsing code more flexible

2021-07-09 Thread Robin Murphy

On 2021-07-09 12:43, Wei Liu wrote:

Microsoft Hypervisor provides a set of hypercalls to manage device
domains. The root kernel should parse the DMAR so that it can program
the IOMMU (with hypercalls) correctly.

The DMAR code was designed to work with Intel IOMMU only. Add two more
parameters to make it useful to Microsoft Hypervisor. Microsoft
Hypervisor does not need the DMAR parsing code to allocate an Intel
IOMMU structure; it also wishes to always reparse the DMAR table even
after it has been parsed before.


We've recently defined the VIOT table for describing paravirtualised 
IOMMUs - would it make more sense to extend that to support the 
Microsoft implementation than to abuse a hardware-specific table? Am I 
right in assuming said hypervisor isn't intended to only ever run on 
Intel hardware?


Robin.


Adjust Intel IOMMU code to use the new dmar_table_init. There should be
no functional change to Intel IOMMU code.

Signed-off-by: Wei Liu 
---
We may be able to combine alloc and force_parse?
---
  drivers/iommu/intel/dmar.c  | 38 -
  drivers/iommu/intel/iommu.c |  2 +-
  drivers/iommu/intel/irq_remapping.c |  2 +-
  include/linux/dmar.h|  2 +-
  4 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb9596c..bd72f47c728b 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -408,7 +408,8 @@ dmar_find_dmaru(struct acpi_dmar_hardware_unit *drhd)
   * structure which uniquely represent one DMA remapping hardware unit
   * present in the platform
   */
-static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
+static int dmar_parse_one_drhd_internal(struct acpi_dmar_header *header,
+   void *arg, bool alloc)
  {
struct acpi_dmar_hardware_unit *drhd;
struct dmar_drhd_unit *dmaru;
@@ -440,12 +441,14 @@ static int dmar_parse_one_drhd(struct acpi_dmar_header 
*header, void *arg)
return -ENOMEM;
}
  
-	ret = alloc_iommu(dmaru);

-   if (ret) {
-   dmar_free_dev_scope(>devices,
-   >devices_cnt);
-   kfree(dmaru);
-   return ret;
+   if (alloc) {
+   ret = alloc_iommu(dmaru);
+   if (ret) {
+   dmar_free_dev_scope(>devices,
+   >devices_cnt);
+   kfree(dmaru);
+   return ret;
+   }
}
dmar_register_drhd_unit(dmaru);
  
@@ -456,6 +459,16 @@ static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)

return 0;
  }
  
+static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)

+{
+   return dmar_parse_one_drhd_internal(header, arg, true);
+}
+
+int dmar_parse_one_drhd_noalloc(struct acpi_dmar_header *header, void *arg)
+{
+   return dmar_parse_one_drhd_internal(header, arg, false);
+}
+
  static void dmar_free_drhd(struct dmar_drhd_unit *dmaru)
  {
if (dmaru->devices && dmaru->devices_cnt)
@@ -633,7 +646,7 @@ static inline int dmar_walk_dmar_table(struct 
acpi_table_dmar *dmar,
   * parse_dmar_table - parses the DMA reporting table
   */
  static int __init
-parse_dmar_table(void)
+parse_dmar_table(bool alloc)
  {
struct acpi_table_dmar *dmar;
int drhd_count = 0;
@@ -650,6 +663,9 @@ parse_dmar_table(void)
.cb[ACPI_DMAR_TYPE_SATC] = _parse_one_satc,
};
  
+	if (!alloc)

+   cb.cb[ACPI_DMAR_TYPE_HARDWARE_UNIT] = 
_parse_one_drhd_noalloc;
+
/*
 * Do it again, earlier dmar_tbl mapping could be mapped with
 * fixed map.
@@ -840,13 +856,13 @@ void __init dmar_register_bus_notifier(void)
  }
  
  
-int __init dmar_table_init(void)

+int __init dmar_table_init(bool alloc, bool force_parse)
  {
static int dmar_table_initialized;
int ret;
  
-	if (dmar_table_initialized == 0) {

-   ret = parse_dmar_table();
+   if (dmar_table_initialized == 0 || force_parse) {
+   ret = parse_dmar_table(alloc);
if (ret < 0) {
if (ret != -ENODEV)
pr_info("Parse DMAR table failure.\n");
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..a4294d310b93 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4310,7 +4310,7 @@ int __init intel_iommu_init(void)
}
  
  	down_write(_global_lock);

-   if (dmar_table_init()) {
+   if (dmar_table_init(true, false)) {
if (force_on)
panic("tboot: Failed to initialize DMAR table\n");
goto out_free_dmar;
diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index f912fe45bea2..0e8abef862e4 100644
--- a/drivers/iommu/intel/irq_remapping.c

Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-07-09 Thread Robin Murphy

On 2021-07-09 12:43, Wei Liu wrote:

Some devices may have been claimed by the hypervisor already. One such
example is a user can assign a NIC for debugging purpose.

Ideally Linux should be able to tell retrieve that information, but
there is no way to do that yet. And designing that new mechanism is
going to take time.

Provide a command line option for skipping devices. This is a stopgap
solution, so it is intentionally undocumented. Hopefully we can retire
it in the future.


Huh? If the host is using a device, why the heck is it exposing any 
knowledge of that device to the guest at all, let alone allowing the 
guest to do anything that could affect its operation!?


Robin.


Signed-off-by: Wei Liu 
---
  drivers/iommu/hyperv-iommu.c | 45 
  1 file changed, 45 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 043dcff06511..353da5036387 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -349,6 +349,16 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
  
  #ifdef CONFIG_HYPERV_ROOT_PVIOMMU
  
+/* The IOMMU will not claim these PCI devices. */

+static char *pci_devs_to_skip;
+static int __init mshv_iommu_setup_skip(char *str) {
+   pci_devs_to_skip = str;
+
+   return 0;
+}
+/* mshv_iommu_skip=(:BB:DD.F)(:BB:DD.F) */
+__setup("mshv_iommu_skip=", mshv_iommu_setup_skip);
+
  /* DMA remapping support */
  struct hv_iommu_domain {
struct iommu_domain domain;
@@ -774,6 +784,41 @@ static struct iommu_device *hv_iommu_probe_device(struct 
device *dev)
if (!dev_is_pci(dev))
return ERR_PTR(-ENODEV);
  
+	/*

+* Skip the PCI device specified in `pci_devs_to_skip`. This is a
+* temporary solution until we figure out a way to extract information
+* from the hypervisor what devices it is already using.
+*/
+   if (pci_devs_to_skip && *pci_devs_to_skip) {
+   int pos = 0;
+   int parsed;
+   int segment, bus, slot, func;
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   do {
+   parsed = 0;
+
+   sscanf(pci_devs_to_skip + pos,
+   " (%x:%x:%x.%x) %n",
+   , , , , );
+
+   if (parsed <= 0)
+   break;
+
+   if (pci_domain_nr(pdev->bus) == segment &&
+   pdev->bus->number == bus &&
+   PCI_SLOT(pdev->devfn) == slot &&
+   PCI_FUNC(pdev->devfn) == func)
+   {
+   dev_info(dev, "skipped by MSHV IOMMU\n");
+   return ERR_PTR(-ENODEV);
+   }
+
+   pos += parsed;
+
+   } while (pci_devs_to_skip[pos]);
+   }
+
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return ERR_PTR(-ENOMEM);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT

2021-06-18 Thread Robin Murphy

On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Extract the code that sets up the IOMMU infrastructure from IORT, since
it can be reused by VIOT. Move it one level up into a new
acpi_iommu_configure_id() function, which calls the IORT parsing
function which in turn calls the acpi_iommu_fwspec_init() helper.


Reviewed-by: Robin Murphy 


Signed-off-by: Jean-Philippe Brucker 
---
  include/acpi/acpi_bus.h   |  3 ++
  include/linux/acpi_iort.h |  8 ++---
  drivers/acpi/arm64/iort.c | 74 +--
  drivers/acpi/scan.c   | 73 +-
  4 files changed, 86 insertions(+), 72 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 3a82faac5767..41f092a269f6 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -588,6 +588,9 @@ struct acpi_pci_root {
  
  bool acpi_dma_supported(struct acpi_device *adev);

  enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+int acpi_iommu_fwspec_init(struct device *dev, u32 id,
+  struct fwnode_handle *fwnode,
+  const struct iommu_ops *ops);
  int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
   u64 *size);
  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f7f054833afd..f1f0842a2cb2 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
  /* IOMMU interface */
  int iort_dma_get_ranges(struct device *dev, u64 *size);
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in);
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
  phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
  #else
@@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
*dev) { }
  /* IOMMU interface */
  static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
  { return -ENODEV; }
-static inline const struct iommu_ops *iort_iommu_configure_id(
- struct device *dev, const u32 *id_in)
-{ return NULL; }
+static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+{ return -ENODEV; }
  static inline
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
  { return 0; }
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index a940be1cf2af..487d1095030d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -806,23 +806,6 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
  }
  
-static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev)

-{
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
-}
-
-static inline int iort_add_device_replay(struct device *dev)
-{
-   int err = 0;
-
-   if (dev->bus && !device_iommu_mapped(dev))
-   err = iommu_probe_device(dev);
-
-   return err;
-}
-
  /**
   * iort_iommu_msi_get_resv_regions - Reserved region driver helper
   * @dev: Device from iommu_get_resv_regions()
@@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
}
  }
  
-static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,

-  struct fwnode_handle *fwnode,
-  const struct iommu_ops *ops)
-{
-   int ret = iommu_fwspec_init(dev, fwnode, ops);
-
-   if (!ret)
-   ret = iommu_fwspec_add_ids(dev, , 1);
-
-   return ret;
-}
-
  static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
  {
struct acpi_iort_root_complex *pci_rc;
@@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return iort_iommu_driver_enabled(node->type) ?
   -EPROBE_DEFER : -ENODEV;
  
-	return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);

+   return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
  }
  
  struct iort_pci_alias_info {

@@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev,
   * @dev: device to configure
   * @id_in: optional input id const value pointer
   *
- * Returns: iommu_ops pointer on configuration success
- *  NULL on configuration failure
+ * Returns: 0 on success, <0 on failure
   */
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
  {

Re: [PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT

2021-06-18 Thread Robin Murphy

On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Extract generic DMA setup code out of IORT, so it can be reused by VIOT.
Keep it in drivers/acpi/arm64 for now, since it could break x86
platforms that haven't run this code so far, if they have invalid
tables.


Reviewed-by: Robin Murphy 


Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/acpi/arm64/Makefile |  1 +
  include/linux/acpi.h|  3 +++
  include/linux/acpi_iort.h   |  6 ++---
  drivers/acpi/arm64/dma.c| 50 ++
  drivers/acpi/arm64/iort.c   | 54 ++---
  drivers/acpi/scan.c |  2 +-
  6 files changed, 66 insertions(+), 50 deletions(-)
  create mode 100644 drivers/acpi/arm64/dma.c

diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..66acbe77f46e 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0-only
  obj-$(CONFIG_ACPI_IORT)   += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
+obj-y  += dma.o
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c60745f657e9..7aaa9559cc19 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa);
  
  #ifdef CONFIG_ARM64

  void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa);
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size);
  #else
  static inline void
  acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
+static inline void
+acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { }
  #endif
  
  int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);

diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1a12baa58e40..f7f054833afd 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, 
u32 id,
  void acpi_configure_pmsi_domain(struct device *dev);
  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
  /* IOMMU interface */
-void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
+int iort_dma_get_ranges(struct device *dev, u64 *size);
  const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
const u32 *id_in);
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
@@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain(
  { return NULL; }
  static inline void acpi_configure_pmsi_domain(struct device *dev) { }
  /* IOMMU interface */
-static inline void iort_dma_setup(struct device *dev, u64 *dma_addr,
- u64 *size) { }
+static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
+{ return -ENODEV; }
  static inline const struct iommu_ops *iort_iommu_configure_id(
  struct device *dev, const u32 *id_in)
  { return NULL; }
diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
new file mode 100644
index ..f16739ad3cc0
--- /dev/null
+++ b/drivers/acpi/arm64/dma.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
+{
+   int ret;
+   u64 end, mask;
+   u64 dmaaddr = 0, size = 0, offset = 0;
+
+   /*
+* If @dev is expected to be DMA-capable then the bus code that created
+* it should have initialised its dma_mask pointer by this point. For
+* now, we'll continue the legacy behaviour of coercing it to the
+* coherent mask if not, but we'll no longer do so quietly.
+*/
+   if (!dev->dma_mask) {
+   dev_warn(dev, "DMA mask not set\n");
+   dev->dma_mask = >coherent_dma_mask;
+   }
+
+   if (dev->coherent_dma_mask)
+   size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
+   else
+   size = 1ULL << 32;
+
+   ret = acpi_dma_get_range(dev, , , );
+   if (ret == -ENODEV)
+   ret = iort_dma_get_ranges(dev, );
+   if (!ret) {
+   /*
+* Limit coherent and dma mask based on size retrieved from
+* firmware.
+*/
+   end = dmaaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   dev->bus_dma_limit = end;
+   dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+   *dev->dma_mask = min(*dev->dma_mask, mask);
+   }
+
+   *dma_addr = dmaaddr;
+   *dma_size = size;
+
+   ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size);
+
+   dev_dbg(dev, 

Re: [PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

2021-06-18 Thread Robin Murphy

On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
   first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.


Reviewed-by: Robin Murphy 


Signed-off-by: Jean-Philippe Brucker 
---
  include/linux/dma-iommu.h   |  4 ++--
  arch/arm64/mm/dma-mapping.c |  2 +-
  drivers/iommu/amd/iommu.c   |  2 +-
  drivers/iommu/dma-iommu.c   | 12 ++--
  drivers/iommu/intel/iommu.c |  5 +
  5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..758ca4694257 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base);
  void iommu_put_dma_cookie(struct iommu_domain *domain);
  
  /* Setup call for arch DMA mapping code */

-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
  
  /* The DMA API isn't _quite_ the whole story, though... */

  /*
@@ -50,7 +50,7 @@ struct msi_msg;
  struct device;
  
  static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,

-   u64 size)
+  u64 dma_limit)
  {
  }
  
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c

index 4bf1dd3eb041..6719f9efea09 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
  
  	dev->dma_coherent = coherent;

if (iommu)
-   iommu_setup_dma_ops(dev, dma_base, size);
+   iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
  
  #ifdef CONFIG_XEN

if (xen_swiotlb_detect())
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..216323fb27ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
else
set_dma_ops(dev, NULL);
  }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..c62e19bed302 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
   * iommu_dma_init_domain - Initialise a DMA mapping domain
   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
   * @base: IOVA at which the mappable address space starts
- * @size: Size of IOVA space
+ * @limit: Last address of the IOVA space
   * @dev: Device the domain is being initialised for
   *
- * @base and @size should be exact multiples of IOMMU page granularity to
+ * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
   * avoid rounding surprises. If necessary, we reserve the page at address 0
   * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
   * any change which could make prior IOVAs invalid will fail.
   */
  static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev)
+dma_addr_t limit, struct device *dev)
  {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
@@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
/* Check the domain allows at least some access to the device... */
if (domain->geometry.force_aperture) {
if (base > domain->geometry.aperture_end ||
-   base + size <= domain->geometry.aperture_start) {
+   limit < domain->geometry.aperture_start) {
pr_warn("specified DMA range outside IOMMU 
capability\n");
return -EFAULT;
}
@@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = {
   * 

Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()

2021-06-18 Thread Robin Murphy

On 2021-06-18 11:50, Jean-Philippe Brucker wrote:

On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote:

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..175f8eaeb5b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
if (domain->type == IOMMU_DOMAIN_DMA) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
-   dev->dma_ops = _dma_ops;
+   set_dma_ops(dev, _dma_ops);
+   } else {
+   set_dma_ops(dev, NULL);


I'm not keen on moving this here, since iommu-dma only knows that its own
ops are right for devices it *is* managing; it can't assume any particular
ops are appropriate for devices it isn't. The idea here is that
arch_setup_dma_ops() may have already set the appropriate ops for the
non-IOMMU case, so if the default domain type is passthrough then we leave
those in place.

For example, I do still plan to revisit my conversion of arch/arm someday,
at which point I'd have to undo this for that reason.


Makes sense, I'll remove this bit.


Simplifying the base and size arguments is of course fine, but TBH I'd say
rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a
considerable faff passing them around for nothing but a tenuous sanity check
in iommu_dma_init_domain(), and now that dev->dma_range_map is a common
thing we should expect that to give us any relevant limitations if we even
still care.


So I started working on this but it gets too bulky for a preparatory
patch. Dropping the parameters from arch_setup_dma_ops() seems especially
complicated because arm32 does need the size parameter for IOMMU mappings
and that value falls back to the bus DMA mask or U32_MAX in the absence of
dma-ranges. I could try to dig into this for a separate series.

Even only dropping the parameters from iommu_setup_dma_ops() isn't
completely trivial (8 files changed, 55 insertions(+), 36 deletions(-)
because we still need the lower IOVA limit from dma_range_map), so I'd
rather send it separately and have it sit in -next for a while.


Oh, sure, I didn't mean to imply that the whole cleanup should be within 
the scope of this series, just that we can shave off as much as we *do* 
need to touch here (which TBH is pretty much what you're doing already), 
and mainly to start taking the attitude that these arguments are now 
superseded and increasingly vestigial.


I expected the cross-arch cleanup to be a bit fiddly, but I'd forgotten 
that arch/arm was still actively using these values, so maybe I can 
revisit this when I pick up my iommu-dma conversion again (I swear it's 
not dead, just resting!)


Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT

2021-06-18 Thread Robin Murphy

On 2021-06-18 08:41, Jean-Philippe Brucker wrote:

Hi Eric,

On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote:

-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
  {
struct acpi_iort_node *node;
-   const struct iommu_ops *ops;
+   const struct iommu_ops *ops = NULL;


Oops, I need to remove this (and add -Werror to my tests.)



+static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
+  const u32 *id_in)
+{
+   int err;
+   const struct iommu_ops *ops;
+
+   /*
+* If we already translated the fwspec there is nothing left to do,
+* return the iommu_ops.
+*/
+   ops = acpi_iommu_fwspec_ops(dev);
+   if (ops)
+   return ops;
+
+   err = iort_iommu_configure_id(dev, id_in);
+
+   /*
+* If we have reason to believe the IOMMU driver missed the initial
+* add_device callback for dev, replay it to get things in order.
+*/
+   if (!err && dev->bus && !device_iommu_mapped(dev))
+   err = iommu_probe_device(dev);

Previously we had:
     if (!err) {
         ops = iort_fwspec_iommu_ops(dev);
         err = iort_add_device_replay(dev);
     }

Please can you explain the transform? I see the

acpi_iommu_fwspec_ops call below but is it not straightforward to me.


I figured that iort_add_device_replay() is only used once and is
sufficiently simple to be inlined manually (saving 10 lines). Then I
replaced the ops assignment with returns, which saves another line and may
be slightly clearer?  I guess it's mostly a matter of taste, the behavior
should be exactly the same.


Right, IIRC the multiple assignments to ops were more of a haphazard 
evolution inherited from the DT version, and looking at it now I think 
the multiple-return is indeed a bit nicer.


Similarly, it looks like the factoring out of iort_add_device_replay() 
was originally an attempt to encapsulate the IOMMU_API dependency, but 
things have moved around a lot since then, so that seems like a sensible 
simplification to make too.


Robin.




Also the comment mentions replay. Unsure if it is still OK.


The "replay" part is, but "add_device" isn't accurate because it has since
been replaced by probe_device. I'll refresh the comment.

Thanks,
Jean
___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()

2021-06-16 Thread Robin Murphy

On 2021-06-10 08:51, Jean-Philippe Brucker wrote:

dma-iommu uses the address bounds described in domain->geometry during
IOVA allocation. The address size parameters of iommu_setup_dma_ops()
are useful for describing additional limits set by the platform
firmware, but aren't needed for drivers that call this function from
probe_finalize(). The base parameter can be zero because dma-iommu
already removes the first IOVA page, and the limit parameter can be
U64_MAX because it's only checked against the domain geometry. Simplify
calls to iommu_setup_dma_ops().

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/amd/iommu.c   |  9 +
  drivers/iommu/dma-iommu.c   |  4 +++-
  drivers/iommu/intel/iommu.c | 10 +-
  3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 94b96d81fcfd..d3123bc05c08 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1708,14 +1708,7 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
  
  static void amd_iommu_probe_finalize(struct device *dev)

  {
-   struct iommu_domain *domain;
-
-   /* Domains are initialized for this device - have a look what we ended 
up with */
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
  }
  
  static void amd_iommu_release_device(struct device *dev)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..175f8eaeb5b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
if (domain->type == IOMMU_DOMAIN_DMA) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
-   dev->dma_ops = _dma_ops;
+   set_dma_ops(dev, _dma_ops);
+   } else {
+   set_dma_ops(dev, NULL);


I'm not keen on moving this here, since iommu-dma only knows that its 
own ops are right for devices it *is* managing; it can't assume any 
particular ops are appropriate for devices it isn't. The idea here is 
that arch_setup_dma_ops() may have already set the appropriate ops for 
the non-IOMMU case, so if the default domain type is passthrough then we 
leave those in place.


For example, I do still plan to revisit my conversion of arch/arm 
someday, at which point I'd have to undo this for that reason.


Simplifying the base and size arguments is of course fine, but TBH I'd 
say rip the whole bloody lot out of the arch_setup_dma_ops() flow now. 
It's a considerable faff passing them around for nothing but a tenuous 
sanity check in iommu_dma_init_domain(), and now that dev->dma_range_map 
is a common thing we should expect that to give us any relevant 
limitations if we even still care.


That said, those are all things which can be fixed up later if the 
series is otherwise ready to go and there's still a chance of landing it 
for 5.14. If you do have any other reason to respin, then I think the 
x86 probe_finalize functions simply want an unconditional 
set_dma_ops(dev, NULL) before the iommu_setup_dma_ops() call.


Cheers,
Robin.


}
  
  	return;

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 85f18342603c..8d866940692a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device 
*dev)
  
  static void intel_iommu_probe_finalize(struct device *dev)

  {
-   dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT;
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-
-   if (domain && domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, base,
-   __DOMAIN_MAX_ADDR(dmar_domain->gaw));
-   else
-   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
  }
  
  static void intel_iommu_get_resv_regions(struct device *device,



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks

2021-06-03 Thread Robin Murphy

On 2021-06-03 01:41, Andi Kleen wrote:

swiotlb currently only uses the start address of a DMA to check if something
is in the swiotlb or not. But with virtio and untrusted hosts the host
could give some DMA mapping that crosses the swiotlb boundaries,
potentially leaking or corrupting data. Add size checks to all the swiotlb
checks and reject any DMAs that cross the swiotlb buffer boundaries.

Signed-off-by: Andi Kleen 
---
  drivers/iommu/dma-iommu.c   | 13 ++---
  drivers/xen/swiotlb-xen.c   | 11 ++-
  include/linux/dma-mapping.h |  4 ++--
  include/linux/swiotlb.h |  8 +---
  kernel/dma/direct.c |  8 
  kernel/dma/direct.h |  8 
  kernel/dma/mapping.c|  4 ++--
  net/xdp/xsk_buff_pool.c |  2 +-
  8 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..7ef13198721b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
  
  	__iommu_dma_unmap(dev, dma_addr, size);


If you can't trust size below then you've already corrupted the IOMMU 
pagetables here :/


Robin.


-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(phys, size)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
  }
  
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,

}
  
  	iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);

-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
  }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
  
-	if (is_swiotlb_buffer(phys))

+   if (is_swiotlb_buffer(phys, size))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
  }
  
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev,

return;
  
  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);

-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(phys, size))
swiotlb_sync_single_for_device(dev, phys, size, dir);
  
  	if (!dev_is_dma_coherent(dev))

@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
  
-		if (is_swiotlb_buffer(sg_phys(sg)))

+   if (is_swiotlb_buffer(sg_phys(sg), sg->length))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
return;
  
  	for_each_sg(sgl, sg, nelems, i) {

-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(sg_phys(sg), sg->length))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
-
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
}
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..333846af8d35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t 
p, size_t size)
return 0;
  }
  
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)

+static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
+size_t size)
  {
unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(paddr, size);
return 0;
  }
  
@@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,

}
  
  	/* NOTE: We use dev_addr here, not paddr! */

-   if (is_xen_swiotlb_buffer(hwdev, dev_addr))
+   if (is_xen_swiotlb_buffer(hwdev, dev_addr, size))
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
  }
  
@@ -448,7 +449,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,

  

Re: [PATCH v1 6/8] dma: Add return value to dma_unmap_page

2021-06-03 Thread Robin Murphy

Hi Andi,

On 2021-06-03 01:41, Andi Kleen wrote:

In some situations when we know swiotlb is forced and we have
to deal with untrusted hosts, it's useful to know if a mapping
was in the swiotlb or not. This allows us to abort any IO
operation that would access memory outside the swiotlb.

Otherwise it might be possible for a malicious host to inject
any guest page in a read operation. While it couldn't directly
access the results of the read() inside the guest, there
might scenarios where data is echoed back with a write(),
and that would then leak guest memory.

Add a return value to dma_unmap_single/page. Most users
of course will ignore it. The return value is set to EIO
if we're in forced swiotlb mode and the buffer is not inside
the swiotlb buffer. Otherwise it's always 0.


I have to say my first impression of this isn't too good :(

What it looks like to me is abusing SWIOTLB's internal housekeeping to 
keep track of virtio-specific state. The DMA API does not attempt to 
validate calls in general since in many cases the additional overhead 
would be prohibitive. It has always been callers' responsibility to keep 
track of what they mapped and make sure sync/unmap calls match, and 
there are many, many, subtle and not-so-subtle ways for things to go 
wrong if they don't. If virtio is not doing a good enough job of that, 
what's the justification for making it the DMA API's problem?



A new callback is used to avoid changing all the IOMMU drivers.


Nit: presumably by "IOMMU drivers" you actually mean arch DMA API backends?

As an aside, we'll take a look at the rest of the series for the 
perspective of our prototyping for Arm's Confidential Compute 
Architecture, but I'm not sure we'll need it, since accesses beyond the 
bounds of the shared SWIOTLB buffer shouldn't be an issue for us. 
Furthermore, AFAICS it's still not going to help against exfiltrating 
guest memory by over-unmapping the original SWIOTLB slot *without* going 
past the end of the whole buffer, but I think Martin's patch *has* 
addressed that already.


Robin.


Signed-off-by: Andi Kleen 
---
  drivers/iommu/dma-iommu.c   | 17 +++--
  include/linux/dma-map-ops.h |  3 +++
  include/linux/dma-mapping.h |  7 ---
  kernel/dma/mapping.c|  6 +-
  4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7ef13198721b..babe46f2ae3a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,8 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
  }
  
-static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,

+static int __iommu_dma_unmap_swiotlb_check(struct device *dev,
+   dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir,
unsigned long attrs)
  {
@@ -500,12 +501,15 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
  
  	phys = iommu_iova_to_phys(domain, dma_addr);

if (WARN_ON(!phys))
-   return;
+   return -EIO;
  
  	__iommu_dma_unmap(dev, dma_addr, size);
  
  	if (unlikely(is_swiotlb_buffer(phys, size)))

swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+   else if (swiotlb_force == SWIOTLB_FORCE)
+   return -EIO;
+   return 0;
  }
  
  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,

@@ -856,12 +860,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
return dma_handle;
  }
  
-static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,

+static int iommu_dma_unmap_page_check(struct device *dev, dma_addr_t 
dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
  {
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_single_for_cpu(dev, dma_handle, size, dir);
-   __iommu_dma_unmap_swiotlb(dev, dma_handle, size, dir, attrs);
+   return __iommu_dma_unmap_swiotlb_check(dev, dma_handle, size, dir,
+  attrs);
  }
  
  /*

@@ -946,7 +951,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, 
struct scatterlist *s
int i;
  
  	for_each_sg(sg, s, nents, i)

-   __iommu_dma_unmap_swiotlb(dev, sg_dma_address(s),
+   __iommu_dma_unmap_swiotlb_check(dev, sg_dma_address(s),
sg_dma_len(s), dir, attrs);
  }
  
@@ -1291,7 +1296,7 @@ static const struct dma_map_ops iommu_dma_ops = {

.mmap   = iommu_dma_mmap,
.get_sgtable= iommu_dma_get_sgtable,
.map_page   = iommu_dma_map_page,
-   .unmap_page = iommu_dma_unmap_page,
+   .unmap_page_check   = iommu_dma_unmap_page_check,
.map_sg   

Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-16 15:38, Christoph Hellwig wrote:
[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f1e38526d5bd40..996dfdf9d375dd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2017,7 +2017,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
-	if (smmu_domain->non_strict)

+   if (!iommu_get_dma_strict())


As Will raised, this also needs to be checking "domain->type == 
IOMMU_DOMAIN_DMA" to maintain equivalent behaviour to the attribute code 
below.



pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
  
  	pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);

@@ -2449,52 +2449,6 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
-{
-   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-
-   switch (domain->type) {
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
-}

[...]

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817c967a25..edb1de479dd1a7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -668,7 +668,6 @@ struct arm_smmu_domain {
struct mutexinit_mutex; /* Protects smmu pointer */
  
  	struct io_pgtable_ops		*pgtbl_ops;

-   boolnon_strict;
atomic_tnr_ats_masters;
  
  	enum arm_smmu_domain_stage	stage;

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0aa6d667274970..3dde22b1f8ffb0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -761,6 +761,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
  
+	if (!iommu_get_dma_strict())


Ditto here.

Sorry for not spotting that sooner :(

Robin.


+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+
if (smmu->impl && smmu->impl->init_context) {
ret = smmu->impl->init_context(smmu_domain, _cfg, dev);
if (ret)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-31 16:32, Will Deacon wrote:

On Wed, Mar 31, 2021 at 02:09:37PM +0100, Robin Murphy wrote:

On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
 missing bits]
Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd/iommu.c   | 23 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
 drivers/iommu/dma-iommu.c   |  9 +--
 drivers/iommu/intel/iommu.c | 64 -
 drivers/iommu/iommu.c   | 27 ++---
 include/linux/iommu.h   |  4 +-
 8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part of
the IOVA allocator that VFIO doesn't even use. It's always been the case
that unmanaged domains only use strict invalidation.


Maybe I'm going mad. With this patch, the SMMU driver unconditionally sets
IO_PGTABLE_QUIRK_NON_STRICT for page-tables if iommu_get_dma_strict() is
true, no? In which case, that will get set for page-tables corresponding
to unmanaged domains as well as DMA domains when it is enabled. That didn't
happen before because you couldn't set the attribute for unmanaged domains.

What am I missing?


Oh cock... sorry, all this time I've been saying what I *expect* it to 
do, while overlooking the fact that the IO_PGTABLE_QUIRK_NON_STRICT 
hunks were the bits I forgot to write and Christoph had to fix up. 
Indeed, those should be checking the domain type too to preserve the 
existing behaviour. Apologies for the confusion.


Robin.


Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the device
level, and there's really nothing inherent to the domain itself that matters
(since the type is implicitly IOMMU_DOMAIN_DMA to even care about this).


Device would probably work too; you'd pass the first device to attach to the
domain when querying this from the SMMU driver, I suppose.

Will


___
Virtualization m

Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-31 Thread Robin Murphy

On 2021-03-31 12:49, Will Deacon wrote:

On Tue, Mar 30, 2021 at 05:28:19PM +0100, Robin Murphy wrote:

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
missing bits]
Signed-off-by: Christoph Hellwig 
---
drivers/iommu/amd/iommu.c   | 23 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
drivers/iommu/dma-iommu.c   |  9 +--
drivers/iommu/intel/iommu.c | 64 -
drivers/iommu/iommu.c   | 27 ++---
include/linux/iommu.h   |  4 +-
8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the
domain attribute by iommu_group_alloc_default_domain(), so there's no
functional change there.


For DMA domains sure, but I don't think that's the case for unmanaged
domains such as those used by VFIO.


Eh? This is only relevant to DMA domains anyway. Flush queues are part 
of the IOVA allocator that VFIO doesn't even use. It's always been the 
case that unmanaged domains only use strict invalidation.



Obviously some of the above checks could be factored out into some kind of
iommu_use_flush_queue() helper that IOMMU drivers can also call if they need
to keep in sync. Or maybe we just allow iommu-dma to set
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're
treating that as a generic thing now.


I think a helper that takes a domain would be a good starting point.


You mean device, right? The one condition we currently have is at the 
device level, and there's really nothing inherent to the domain itself 
that matters (since the type is implicitly IOMMU_DOMAIN_DMA to even care 
about this).


Another idea that's just come to mind is now that IOMMU_DOMAIN_DMA has a 
standard meaning, maybe we could split out a separate 
IOMMU_DOMAIN_DMA_STRICT type such that it can all propagate from 
iommu_get_def_domain_type()? That feels like it might be quite 
promising, but I'd still do it as an improvement on top of this patch, 
since it's beyond just cleaning up the abuse of domain attributes to 
pass a command-line option around.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
   missing bits]
Signed-off-by: Christoph Hellwig 
---
   drivers/iommu/amd/iommu.c   | 23 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
   drivers/iommu/dma-iommu.c   |  9 +--
   drivers/iommu/intel/iommu.c | 64 -
   drivers/iommu/iommu.c   | 27 ++---
   include/linux/iommu.h   |  4 +-
   8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the 
domain attribute by iommu_group_alloc_default_domain(), so there's no 
functional change there.


Obviously some of the above checks could be factored out into some kind 
of iommu_use_flush_queue() helper that IOMMU drivers can also call if 
they need to keep in sync. Or maybe we just allow iommu-dma to set 
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if 
we're treating that as a generic thing now.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/


Ah, sorry, I didn't spot that! I was just trying to illustrate that this
is per-device.


Sure, I understand - and I'm just trying to bang home that despite 
appearances it's never actually been treated as such for SMMU, so 
anything that's wrong after this change was already wrong before.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
  missing bits]
Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
  drivers/iommu/dma-iommu.c   |  9 +--
  drivers/iommu/intel/iommu.c | 64 -
  drivers/iommu/iommu.c   | 27 ++---
  include/linux/iommu.h   |  4 +-
  8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of 
stuffing a global property into a domain attribute then pulling it out 
again in the illusion that it was in any way per-domain. We're still 
checking dev_is_untrusted() before making an actual decision, and it's 
not like we can't add more factors at that point if we want to.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/

Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] ACPI: Add driver for the VIOT table

2021-03-18 Thread Robin Murphy

On 2021-03-16 19:16, Jean-Philippe Brucker wrote:

The ACPI Virtual I/O Translation Table describes topology of
para-virtual platforms. For now it describes the relation between
virtio-iommu and the endpoints it manages. Supporting that requires
three steps:

(1) acpi_viot_init(): parse the VIOT table, build a list of endpoints
 and vIOMMUs.

(2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the
 device probed, register it to the VIOT driver. This step is required
 because unlike similar drivers, VIOT doesn't create the vIOMMU
 device.


Note that you're basically the same as the DT case in this regard, so 
I'd expect things to be closer to that pattern than to that of IORT.


[...]

@@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum 
dev_dma_attr attr,
  {
const struct iommu_ops *iommu;
u64 dma_addr = 0, size = 0;
+   int ret;
  
  	if (attr == DEV_DMA_NOT_SUPPORTED) {

set_dma_ops(dev, _dummy_ops);
return 0;
}
  
+	ret = acpi_viot_dma_setup(dev, attr);

+   if (ret)
+   return ret > 0 ? 0 : ret;


I think things could do with a fair bit of refactoring here. Ideally we 
want to process a possible _DMA method (acpi_dma_get_range()) regardless 
of which flavour of IOMMU table might be present, and the amount of 
duplication we fork into at this point is unfortunate.



+
iort_dma_setup(dev, _addr, );


For starters I think most of that should be dragged out to this level 
here - it's really only the {rc,nc}_dma_get_range() bit that deserves to 
be the IORT-specific call.



iommu = iort_iommu_configure_id(dev, input_id);


Similarly, it feels like it's only the table scan part in the middle of 
that that needs dispatching between IORT/VIOT, and its head and tail 
pulled out into a common path.


[...]

+static const struct iommu_ops *viot_iommu_setup(struct device *dev)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct viot_iommu *viommu = NULL;
+   struct viot_endpoint *ep;
+   u32 epid;
+   int ret;
+
+   /* Already translated? */
+   if (fwspec && fwspec->ops)
+   return NULL;
+
+   mutex_lock(_lock);
+   list_for_each_entry(ep, _endpoints, list) {
+   if (viot_device_match(dev, >dev_id, )) {
+   epid += ep->endpoint_id;
+   viommu = ep->viommu;
+   break;
+   }
+   }
+   mutex_unlock(_lock);
+   if (!viommu)
+   return NULL;
+
+   /* We're not translating ourself */
+   if (viot_device_match(dev, >dev_id, ))
+   return NULL;
+
+   /*
+* If we found a PCI range managed by the viommu, we're the one that has
+* to request ACS.
+*/
+   if (dev_is_pci(dev))
+   pci_request_acs();
+
+   if (!viommu->ops || WARN_ON(!viommu->dev))
+   return ERR_PTR(-EPROBE_DEFER);


Can you create (or look up) a viommu->fwnode when initially parsing the 
VIOT to represent the IOMMU devices to wait for, such that the 
viot_device_match() lookup can resolve to that and let you fall into the 
standard iommu_ops_from_fwnode() path? That's what I mean about 
following the DT pattern - I guess it might need a bit of trickery to 
rewrite things if iommu_device_register() eventually turns up with a new 
fwnode, so I doubt we can get away without *some* kind of private 
interface between virtio-iommu and VIOT, but it would be nice for the 
common(ish) DMA paths to stay as unaware of the specifics as possible.



+
+   ret = iommu_fwspec_init(dev, viommu->dev->fwnode, viommu->ops);
+   if (ret)
+   return ERR_PTR(ret);
+
+   iommu_fwspec_add_ids(dev, , 1);
+
+   /*
+* If we have reason to believe the IOMMU driver missed the initial
+* add_device callback for dev, replay it to get things in order.
+*/
+   if (dev->bus && !device_iommu_mapped(dev))
+   iommu_probe_device(dev);
+
+   return viommu->ops;
+}
+
+/**
+ * acpi_viot_dma_setup - Configure DMA for an endpoint described in VIOT
+ * @dev: the endpoint
+ * @attr: coherency property of the endpoint
+ *
+ * Setup the DMA and IOMMU ops for an endpoint described by the VIOT table.
+ *
+ * Return:
+ * * 0 - @dev doesn't match any VIOT node
+ * * 1 - ops for @dev were successfully installed
+ * * -EPROBE_DEFER - ops for @dev aren't yet available
+ */
+int acpi_viot_dma_setup(struct device *dev, enum dev_dma_attr attr)
+{
+   const struct iommu_ops *iommu_ops = viot_iommu_setup(dev);
+
+   if (IS_ERR_OR_NULL(iommu_ops)) {
+   int ret = PTR_ERR(iommu_ops);
+
+   if (ret == -EPROBE_DEFER || ret == 0)
+   return ret;
+   dev_err(dev, "error %d while setting up virt IOMMU\n", ret);
+   return 0;
+   }
+
+#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS

Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2021-03-18 Thread Robin Murphy

On 2021-03-16 19:16, Jean-Philippe Brucker wrote:

With the VIOT support in place, x86 platforms can now use the
virtio-iommu.

The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it
themselves.


Actually, now that both AMD and Intel are converted over, maybe it's 
finally time to punt that to x86 arch code to match arm64?


Robin.


Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/Kconfig | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 2819b5c8ec30..ccca83ef2f06 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -400,8 +400,9 @@ config HYPERV_IOMMU
  config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA if X86
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-16 Thread Robin Murphy

On 2021-03-15 08:33, Christoph Hellwig wrote:

On Fri, Mar 12, 2021 at 04:18:24PM +, Robin Murphy wrote:

Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and
pretending that io_pgtable_cfg is anything other than a SoC-specific
private interface,


I think a private inteface would make more sense.  For now I've just
condensed it down to a generic set of quirk bits and dropped the
attrs structure, which seems like an ok middle ground for now.  That
being said I wonder why that quirk isn't simply set in the device
tree?


Because it's a software policy decision rather than any inherent 
property of the platform, and the DT certainly doesn't know *when* any 
particular device might prefer its IOMMU to use cacheable pagetables to 
minimise TLB miss latency vs. saving the cache capacity for larger data 
buffers. It really is most logical to decide this at the driver level.


In truth the overall concept *is* relatively generic (a trend towards 
larger system caches and cleverer usage is about both raw performance 
and saving power on off-SoC DRAM traffic), it's just the particular 
implementation of using io-pgtable to set an outer-cacheable walk 
attribute in an SMMU TCR that's pretty much specific to Qualcomm SoCs. 
Hence why having a common abstraction at the iommu_domain level, but 
where the exact details are free to vary across different IOMMUs and 
their respective client drivers, is in many ways an ideal fit.



but the reworked patch on its own looks reasonable to
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers
either...) Just iommu_get_dma_strict() needs an export since the SMMU
drivers can be modular - I consciously didn't add that myself since I was
mistakenly thinking only iommu-dma would call it.


Fixed.  Can I get your signoff for the patch?  Then I'll switch it to
over to being attributed to you.


Sure - I would have thought that the one I originally posted still 
stands, but for the avoidance of doubt, for the parts of commit 
8b6d45c495bd in your tree that remain from what I wrote:


Signed-off-by: Robin Murphy 

Cheers,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-12 Thread Robin Murphy

On 2021-03-11 08:26, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 06:39:57PM +, Robin Murphy wrote:

Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?


It seems to miss a few bits, and also generally seems to be not actually
apply to recent mainline or something like it due to different empty
lines in a few places.


Yeah, that was sketched out on top of some other development patches, 
and in being so focused on not breaking any of the x86 behaviours I did 
indeed overlook fully converting the SMMU drivers... oops!


(my thought was to do the conversion for its own sake, then clean up the 
redundant attribute separately, but I guess it's fine either way)



Let me know what you think of the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/iommu-cleanup

I'll happily switch the patch to you as the author if you're fine with
that as well.


I still have reservations about removing the attribute API entirely and 
pretending that io_pgtable_cfg is anything other than a SoC-specific 
private interface, but the reworked patch on its own looks reasonable to 
me, thanks! (I wasn't too convinced about the iommu_cmd_line wrappers 
either...) Just iommu_get_dma_strict() needs an export since the SMMU 
drivers can be modular - I consciously didn't add that myself since I 
was mistakenly thinking only iommu-dma would call it.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-10 Thread Robin Murphy

On 2021-03-10 09:25, Christoph Hellwig wrote:

On Wed, Mar 10, 2021 at 10:15:01AM +0100, Christoph Hellwig wrote:

On Thu, Mar 04, 2021 at 03:25:27PM +, Robin Murphy wrote:

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this
through the drivers at all? Seems like it would make more sense for the x86
drivers to reflect their private options back to iommu_dma_strict (and
allow Intel's caching mode to override it as well), then have
iommu_dma_init_domain just test !iommu_dma_strict &&
domain->ops->flush_iotlb_all.


Hmm.  I looked at this, and kill off ->dma_enable_flush_queue for
the ARM drivers and just looking at iommu_dma_strict seems like a
very clear win.

OTOH x86 is a little more complicated.  AMD and intel defaul to lazy
mode, so we'd have to change the global iommu_dma_strict if they are
initialized.  Also Intel has not only a "static" option to disable
lazy mode, but also a "dynamic" one where it iterates structure.  So
I think on the get side we're stuck with the method, but it still
simplifies the whole thing.


Actually... Just mirroring the iommu_dma_strict value into
struct iommu_domain should solve all of that with very little
boilerplate code.


Yes, my initial thought was to directly replace the attribute with a
common flag at iommu_domain level, but since in all cases the behaviour
is effectively global rather than actually per-domain, it seemed
reasonable to take it a step further. This passes compile-testing for
arm64 and x86, what do you think?

Robin.

->8-
Subject: [PATCH] iommu: Consolidate strict invalidation handling

Now that everyone is using iommu-dma, the global invalidation policy
really doesn't need to be woven through several parts of the core API
and individual drivers, we can just look it up directly at the one point
that we now make the flush queue decision. If the x86 drivers reflect
their internal options and overrides back to iommu_dma_strict, that can
become the canonical source.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c   |  2 ++
 drivers/iommu/dma-iommu.c   |  8 +---
 drivers/iommu/intel/iommu.c | 12 
 drivers/iommu/iommu.c   | 35 +++
 include/linux/iommu.h   |  2 ++
 5 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..1db29e59d468 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1856,6 +1856,8 @@ int __init amd_iommu_init_dma_ops(void)
else
pr_info("Lazy IO/TLB flushing enabled\n");
 
+	iommu_set_dma_strict(amd_iommu_unmap_flush);

+
return 0;
 
 }

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..789a950cc125 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -304,10 +304,6 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain 
*iovad)
 
 	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);

domain = cookie->fq_domain;
-   /*
-* The IOMMU driver supporting DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
-* implies that ops->flush_iotlb_all must be non-NULL.
-*/
domain->ops->flush_iotlb_all(domain);
 }
 
@@ -334,7 +330,6 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,

struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
-   int attr;
 
 	if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)

return -EINVAL;
@@ -371,8 +366,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
init_iova_domain(iovad, 1UL << order, base_pfn);
 
 	if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&

-   !iommu_domain_get_attr(domain, DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, ) 
&&
-   attr) {
+   domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
pr_warn("iova flush queue initialization failed\n");
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b5c746f0f63b..f5b452cd1266 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4377,6 +4377,17 @@ int __init intel_iommu_init(void)
 
 	down_read(_global_lock);

for_each_active_iommu(iommu, drhd) {
+   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
+   /*
+* The flush queue implementation does not perform 
page-selective
+

Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, _cfg);
+   iommu_domain_set_pgtable_attr(iommu, _cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
  struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
+struct io_pgtable_domain_attr;

Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = 

Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2020-08-26 Thread Robin Murphy

Hi Tom,

On 2019-12-21 15:03, Tom Murphy wrote:

This patchset converts the intel iommu driver to the dma-iommu api.

While converting the driver I exposed a bug in the intel i915 driver which 
causes a huge amount of artifacts on the screen of my laptop. You can see a 
picture of it here:
https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg

This issue is most likely in the i915 driver and is most likely caused by the 
driver not respecting the return value of the dma_map_ops::map_sg function. You 
can see the driver ignoring the return value here:
https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51

Previously this didn’t cause issues because the intel map_sg always returned 
the same number of elements as the input scatter gather list but with the 
change to this dma-iommu api this is no longer the case. I wasn’t able to track 
the bug down to a specific line of code unfortunately.

Could someone from the intel team look at this?


I have been testing on a lenovo x1 carbon 5th generation. Let me know if 
there’s any more information you need.

To allow my patch set to be tested I have added a patch (patch 8/8) in this 
series to disable combining sg segments in the dma-iommu api which fixes the 
bug but it doesn't fix the actual problem.

As part of this patch series I copied the intel bounce buffer code to the 
dma-iommu path. The addition of the bounce buffer code took me by surprise. I 
did most of my development on this patch series before the bounce buffer code 
was added and my reimplementation in the dma-iommu path is very rushed and not 
properly tested but I’m running out of time to work on this patch set.

On top of that I also didn’t port over the intel tracing code from this commit:
https://github.com/torvalds/linux/commit/3b53034c268d550d9e8522e613a14ab53b8840d8#diff-6b3e7c4993f05e76331e463ab1fc87e1
So all the work in that commit is now wasted. The code will need to be removed 
and reimplemented in the dma-iommu path. I would like to take the time to do 
this but I really don’t have the time at the moment and I want to get these 
changes out before the iommu code changes any more.


Further to what we just discussed at LPC, I've realised that tracepoints 
are actually something I could do with *right now* for debugging my Arm 
DMA ops series, so if I'm going to hack something up anyway I may as 
well take responsibility for polishing it into a proper patch as well :)


Robin.



Tom Murphy (8):
   iommu/vt-d: clean up 32bit si_domain assignment
   iommu/vt-d: Use default dma_direct_* mapping functions for direct
 mapped devices
   iommu/vt-d: Remove IOVA handling code from non-dma_ops path
   iommu: Handle freelists when using deferred flushing in iommu drivers
   iommu: Add iommu_dma_free_cpu_cached_iovas function
   iommu: allow the dma-iommu api to use bounce buffers
   iommu/vt-d: Convert intel iommu driver to the iommu ops
   DO NOT MERGE: iommu: disable list appending in dma-iommu

  drivers/iommu/Kconfig   |   1 +
  drivers/iommu/amd_iommu.c   |  14 +-
  drivers/iommu/arm-smmu-v3.c |   3 +-
  drivers/iommu/arm-smmu.c|   3 +-
  drivers/iommu/dma-iommu.c   | 183 +--
  drivers/iommu/exynos-iommu.c|   3 +-
  drivers/iommu/intel-iommu.c | 936 
  drivers/iommu/iommu.c   |  39 +-
  drivers/iommu/ipmmu-vmsa.c  |   3 +-
  drivers/iommu/msm_iommu.c   |   3 +-
  drivers/iommu/mtk_iommu.c   |   3 +-
  drivers/iommu/mtk_iommu_v1.c|   3 +-
  drivers/iommu/omap-iommu.c  |   3 +-
  drivers/iommu/qcom_iommu.c  |   3 +-
  drivers/iommu/rockchip-iommu.c  |   3 +-
  drivers/iommu/s390-iommu.c  |   3 +-
  drivers/iommu/tegra-gart.c  |   3 +-
  drivers/iommu/tegra-smmu.c  |   3 +-
  drivers/iommu/virtio-iommu.c|   3 +-
  drivers/vfio/vfio_iommu_type1.c |   2 +-
  include/linux/dma-iommu.h   |   3 +
  include/linux/intel-iommu.h |   1 -
  include/linux/iommu.h   |  32 +-
  23 files changed, 345 insertions(+), 908 deletions(-)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2 1/2] Add new flush_iotlb_range and handle freelists when using iommu_unmap_fast

2020-08-18 Thread Robin Murphy

On 2020-08-18 07:04, Tom Murphy wrote:

Add a flush_iotlb_range to allow flushing of an iova range instead of a
full flush in the dma-iommu path.

Allow the iommu_unmap_fast to return newly freed page table pages and
pass the freelist to queue_iova in the dma-iommu ops path.

This patch is useful for iommu drivers (in this case the intel iommu
driver) which need to wait for the ioTLB to be flushed before newly
free/unmapped page table pages can be freed. This way we can still batch
ioTLB free operations and handle the freelists.


It sounds like the freelist is something that logically belongs in the 
iommu_iotlb_gather structure. And even if it's not a perfect fit I'd be 
inclined to jam it in there anyway just to avoid this giant argument 
explosion ;)


Why exactly do we need to introduce a new flush_iotlb_range() op? Can't 
the AMD driver simply use the gather mechanism like everyone else?


Robin.


Change-log:
V2:
-fix missing parameter in mtk_iommu_v1.c

Signed-off-by: Tom Murphy 
---
  drivers/iommu/amd/iommu.c   | 14 -
  drivers/iommu/arm-smmu-v3.c |  3 +-
  drivers/iommu/arm-smmu.c|  3 +-
  drivers/iommu/dma-iommu.c   | 45 ---
  drivers/iommu/exynos-iommu.c|  3 +-
  drivers/iommu/intel/iommu.c | 54 +
  drivers/iommu/iommu.c   | 25 +++
  drivers/iommu/ipmmu-vmsa.c  |  3 +-
  drivers/iommu/msm_iommu.c   |  3 +-
  drivers/iommu/mtk_iommu.c   |  3 +-
  drivers/iommu/mtk_iommu_v1.c|  3 +-
  drivers/iommu/omap-iommu.c  |  3 +-
  drivers/iommu/qcom_iommu.c  |  3 +-
  drivers/iommu/rockchip-iommu.c  |  3 +-
  drivers/iommu/s390-iommu.c  |  3 +-
  drivers/iommu/sun50i-iommu.c|  3 +-
  drivers/iommu/tegra-gart.c  |  3 +-
  drivers/iommu/tegra-smmu.c  |  3 +-
  drivers/iommu/virtio-iommu.c|  3 +-
  drivers/vfio/vfio_iommu_type1.c |  2 +-
  include/linux/iommu.h   | 21 +++--
  21 files changed, 150 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2f22326ee4df..25fbacab23c3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2513,7 +2513,8 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
  
  static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,

  size_t page_size,
- struct iommu_iotlb_gather *gather)
+ struct iommu_iotlb_gather *gather,
+ struct page **freelist)
  {
struct protection_domain *domain = to_pdomain(dom);
struct domain_pgtable pgtable;
@@ -2636,6 +2637,16 @@ static void amd_iommu_flush_iotlb_all(struct 
iommu_domain *domain)
spin_unlock_irqrestore(>lock, flags);
  }
  
+static void amd_iommu_flush_iotlb_range(struct iommu_domain *domain,

+   unsigned long iova, size_t size,
+   struct page *freelist)
+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_pages(dom, iova, size);
+   domain_flush_complete(dom);
+}
+
  static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 struct iommu_iotlb_gather *gather)
  {
@@ -2675,6 +2686,7 @@ const struct iommu_ops amd_iommu_ops = {
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .flush_iotlb_range = amd_iommu_flush_iotlb_range,
.iotlb_sync = amd_iommu_iotlb_sync,
.def_domain_type = amd_iommu_def_domain_type,
  };
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677a5c41..8d328dc25326 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2854,7 +2854,8 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
  }
  
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,

-size_t size, struct iommu_iotlb_gather *gather)
+size_t size, struct iommu_iotlb_gather *gather,
+struct page **freelist)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..0cd0dfc89875 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1234,7 +1234,8 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
  }
  
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,

-size_t size, struct iommu_iotlb_gather *gather)
+size_t size, struct iommu_iotlb_gather *gather,
+struct page **freelist)

Re: [PATCH v3 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-07-01 Thread Robin Murphy

On 2020-07-01 01:40, Qian Cai wrote:

Looks like this patchset introduced an use-after-free on arm-smmu-v3.

Reproduced using mlx5,

# echo 1 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs
# echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs

The .config,
https://github.com/cailca/linux-mm/blob/master/arm64.config

Looking at the free stack,

iommu_release_device->iommu_group_remove_device

was introduced in 07/34 ("iommu: Add probe_device() and release_device()
call-backs").


Right, iommu_group_remove_device can tear down the group and call 
->domain_free before the driver has any knowledge of the last device 
going away via the ->release_device call.


I guess the question is do we simply flip the call order in 
iommu_release_device() so drivers can easily clean up their internal 
per-device state first, or do we now want them to be robust against 
freeing domains with devices still nominally attached?


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 17/34] iommu/arm-smmu: Store device instead of group in arm_smmu_s2cr

2020-04-08 Thread Robin Murphy

On 2020-04-08 3:37 pm, Joerg Roedel wrote:

Hi Robin,

thanks for looking into this.

On Wed, Apr 08, 2020 at 01:09:40PM +0100, Robin Murphy wrote:

For a hot-pluggable bus where logical devices may share Stream IDs (like
fsl-mc), this could happen:

   create device A
   iommu_probe_device(A)
 iommu_device_group(A) -> alloc group X
   create device B
   iommu_probe_device(B)
 iommu_device_group(A) -> lookup returns group X
   ...
   iommu_remove_device(A)
   delete device A
   create device C
   iommu_probe_device(C)
 iommu_device_group(C) -> use-after-free of A

Preserving the logical behaviour here would probably look *something* like
the mangled diff below, but I haven't thought it through 100%.


Yeah, I think you are right. How about just moving the loop which sets
s2crs[idx].group to arm_smmu_device_group()? In that case I can drop
this patch and leave the group pointer in place.


Isn't that exactly what I suggested? :)

I don't recall for sure, but knowing me, that bit of group bookkeeping 
is only where it currently is because it cheekily saves iterating the 
IDs a second time. I don't think there's any technical reason.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 17/34] iommu/arm-smmu: Store device instead of group in arm_smmu_s2cr

2020-04-08 Thread Robin Murphy

On 2020-04-07 7:37 pm, Joerg Roedel wrote:

From: Joerg Roedel 

This is required to convert the arm-smmu driver to the
probe/release_device() interface.

Signed-off-by: Joerg Roedel 
---
  drivers/iommu/arm-smmu.c | 14 +-
  1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..3493501d8b2c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -69,7 +69,7 @@ MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices that 
are not attached to an iommu domain will report an abort back to the device and will not 
be allowed to pass through the SMMU.");
  
  struct arm_smmu_s2cr {

-   struct iommu_group  *group;
+   struct device   *dev;
int count;
enum arm_smmu_s2cr_type type;
enum arm_smmu_s2cr_privcfg  privcfg;
@@ -1100,7 +1100,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
/* It worked! Now, poke the actual hardware */
for_each_cfg_sme(cfg, fwspec, i, idx) {
arm_smmu_write_sme(smmu, idx);
-   smmu->s2crs[idx].group = group;
+   smmu->s2crs[idx].dev = dev;
}
  
  	mutex_unlock(>stream_map_mutex);

@@ -1495,11 +1495,15 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
int i, idx;
  
  	for_each_cfg_sme(cfg, fwspec, i, idx) {

-   if (group && smmu->s2crs[idx].group &&
-   group != smmu->s2crs[idx].group)
+   struct iommu_group *idx_grp = NULL;
+
+   if (smmu->s2crs[idx].dev)
+   idx_grp = smmu->s2crs[idx].dev->iommu_group;


For a hot-pluggable bus where logical devices may share Stream IDs (like 
fsl-mc), this could happen:


  create device A
  iommu_probe_device(A)
iommu_device_group(A) -> alloc group X
  create device B
  iommu_probe_device(B)
iommu_device_group(A) -> lookup returns group X
  ...
  iommu_remove_device(A)
  delete device A
  create device C
  iommu_probe_device(C)
iommu_device_group(C) -> use-after-free of A

Preserving the logical behaviour here would probably look *something* 
like the mangled diff below, but I haven't thought it through 100%.


Robin.

->8-
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..e88612ee47fe 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1100,10 +1100,8 @@ static int arm_smmu_master_alloc_smes(struct 
device *dev)

iommu_group_put(group);

/* It worked! Now, poke the actual hardware */
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(fwspec, i, idx)
arm_smmu_write_sme(smmu, idx);
-   smmu->s2crs[idx].group = group;
-   }

mutex_unlock(>stream_map_mutex);
return 0;
@@ -1500,15 +1498,17 @@ static struct iommu_group 
*arm_smmu_device_group(struct device *dev)

}

if (group)
-   return iommu_group_ref_get(group);
-
-   if (dev_is_pci(dev))
+   iommu_group_ref_get(group);
+   else if (dev_is_pci(dev))
group = pci_device_group(dev);
else if (dev_is_fsl_mc(dev))
group = fsl_mc_device_group(dev);
else
group = generic_device_group(dev);

+   for_each_cfg_sme(fwspec, i, idx)
+   smmu->s2crs[idx].group = group;
+
return group;
 }
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2] iommu/virtio: Use page size bitmap supported by endpoint

2020-04-01 Thread Robin Murphy

On 2020-04-01 12:38 pm, Bharat Bhushan wrote:

Different endpoint can support different page size, probe
endpoint if it supports specific page size otherwise use
global page sizes.

Signed-off-by: Bharat Bhushan 
---
  drivers/iommu/virtio-iommu.c  | 33 +++
  include/uapi/linux/virtio_iommu.h |  7 +++
  2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index cce329d71fba..c794cb5b7b3e 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -78,6 +78,7 @@ struct viommu_endpoint {
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
struct list_headresv_regions;
+   u64 pgsize_bitmap;
  };
  
  struct viommu_request {

@@ -415,6 +416,20 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
  }
  
+static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,

+   struct virtio_iommu_probe_pgsize_mask *mask,
+   size_t len)
+
+{
+   u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
+
+   if (len < sizeof(*mask))
+   return -EINVAL;
+
+   vdev->pgsize_bitmap = pgsize_bitmap;
+   return 0;
+}
+
  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
   struct virtio_iommu_probe_resv_mem *mem,
   size_t len)
@@ -494,11 +509,13 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
   cur < viommu->probe_size) {
len = le16_to_cpu(prop->length) + sizeof(*prop);
-
switch (type) {
case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
ret = viommu_add_resv_mem(vdev, (void *)prop, len);
break;
+   case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
+   ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
+   break;
default:
dev_err(dev, "unknown viommu prop 0x%x\n", type);
}
@@ -607,16 +624,23 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
return >domain;
  }
  
-static int viommu_domain_finalise(struct viommu_dev *viommu,

+static int viommu_domain_finalise(struct viommu_endpoint *vdev,
  struct iommu_domain *domain)
  {
int ret;
struct viommu_domain *vdomain = to_viommu_domain(domain);
+   struct viommu_dev *viommu = vdev->viommu;
  
  	vdomain->viommu		= viommu;

vdomain->map_flags   = viommu->map_flags;
  
-	domain->pgsize_bitmap	= viommu->pgsize_bitmap;

+   /* Devices in same domain must support same size pages */


AFAICS what the code appears to do is enforce that the first endpoint 
attached to any domain has the same pgsize_bitmap as the most recently 
probed viommu_dev instance, then ignore any subsequent endpoints 
attached to the same domain. Thus I'm not sure that comment is accurate.


Robin.


+   if ((domain->pgsize_bitmap != viommu->pgsize_bitmap) &&
+   (domain->pgsize_bitmap != vdev->pgsize_bitmap))
+   return -EINVAL;
+
+   domain->pgsize_bitmap = vdev->pgsize_bitmap;
+
domain->geometry = viommu->geometry;
  
  	ret = ida_alloc_range(>domain_ids, viommu->first_domain,

@@ -657,7 +681,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
 * Properly initialize the domain now that we know which viommu
 * owns it.
 */
-   ret = viommu_domain_finalise(vdev->viommu, domain);
+   ret = viommu_domain_finalise(vdev, domain);
} else if (vdomain->viommu != vdev->viommu) {
dev_err(dev, "cannot attach to foreign vIOMMU\n");
ret = -EXDEV;
@@ -875,6 +899,7 @@ static int viommu_add_device(struct device *dev)
  
  	vdev->dev = dev;

vdev->viommu = viommu;
+   vdev->pgsize_bitmap = viommu->pgsize_bitmap;
INIT_LIST_HEAD(>resv_regions);
fwspec->iommu_priv = vdev;
  
diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h

index 237e36a280cb..dc9d3f40bcd8 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap {
  
  #define VIRTIO_IOMMU_PROBE_T_NONE		0

  #define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1
+#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK2
  
  #define VIRTIO_IOMMU_PROBE_T_MASK		0xfff
  
@@ -119,6 +120,12 @@ struct virtio_iommu_probe_property {

__le16  length;
  };
  
+struct virtio_iommu_probe_pgsize_mask {

+   struct virtio_iommu_probe_property  head;
+

Re: [PATCH v2 3/3] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-26 Thread Robin Murphy

On 2020-03-26 9:35 am, Jean-Philippe Brucker wrote:

We don't currently support IOMMUs with a page granule larger than the
system page size. The IOVA allocator has a BUG_ON() in this case, and
VFIO has a WARN_ON().

Removing these obstacles ranges doesn't seem possible without major
changes to the DMA API and VFIO. Some callers of iommu_map(), for
example, want to map multiple page-aligned regions adjacent to each
others for scatter-gather purposes. Even in simple DMA API uses, a call
to dma_map_page() would let the endpoint access neighbouring memory. And
VFIO users cannot ensure that their virtual address buffer is physically
contiguous at the IOMMU granule.

Rather than triggering the IOVA BUG_ON() on mismatched page sizes, abort
the vdomain finalise() with an error message. We could simply abort the
viommu probe(), but an upcoming extension to virtio-iommu will allow
setting different page masks for each endpoint.


Reviewed-by: Robin Murphy 


Reported-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
---
v1->v2: Move to vdomain_finalise(), improve commit message
---
  drivers/iommu/virtio-iommu.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5eed75cd121f..750f69c49b95 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -607,12 +607,22 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
return >domain;
  }
  
-static int viommu_domain_finalise(struct viommu_dev *viommu,

+static int viommu_domain_finalise(struct viommu_endpoint *vdev,
  struct iommu_domain *domain)
  {
int ret;
+   unsigned long viommu_page_size;
+   struct viommu_dev *viommu = vdev->viommu;
struct viommu_domain *vdomain = to_viommu_domain(domain);
  
+	viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);

+   if (viommu_page_size > PAGE_SIZE) {
+   dev_err(vdev->dev,
+   "granule 0x%lx larger than system page size 0x%lx\n",
+   viommu_page_size, PAGE_SIZE);
+   return -EINVAL;
+   }
+
ret = ida_alloc_range(>domain_ids, viommu->first_domain,
  viommu->last_domain, GFP_KERNEL);
if (ret < 0)
@@ -659,7 +669,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
 * Properly initialize the domain now that we know which viommu
 * owns it.
 */
-   ret = viommu_domain_finalise(vdev->viommu, domain);
+   ret = viommu_domain_finalise(vdev, domain);
} else if (vdomain->viommu != vdev->viommu) {
dev_err(dev, "cannot attach to foreign vIOMMU\n");
ret = -EXDEV;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/3] iommu/virtio: Fix freeing of incomplete domains

2020-03-26 Thread Robin Murphy

On 2020-03-26 9:35 am, Jean-Philippe Brucker wrote:

Calling viommu_domain_free() on a domain that hasn't been finalised (not
attached to any device, for example) can currently cause an Oops,
because we attempt to call ida_free() on ID 0, which may either be
unallocated or used by another domain.

Only initialise the vdomain->viommu pointer, which denotes a finalised
domain, at the end of a successful viommu_domain_finalise().


Reviewed-by: Robin Murphy 


Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
Reported-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/virtio-iommu.c | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index cce329d71fba..5eed75cd121f 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -613,18 +613,20 @@ static int viommu_domain_finalise(struct viommu_dev 
*viommu,
int ret;
struct viommu_domain *vdomain = to_viommu_domain(domain);
  
-	vdomain->viommu		= viommu;

-   vdomain->map_flags   = viommu->map_flags;
+   ret = ida_alloc_range(>domain_ids, viommu->first_domain,
+ viommu->last_domain, GFP_KERNEL);
+   if (ret < 0)
+   return ret;
+
+   vdomain->id  = (unsigned int)ret;
  
  	domain->pgsize_bitmap	= viommu->pgsize_bitmap;

domain->geometry = viommu->geometry;
  
-	ret = ida_alloc_range(>domain_ids, viommu->first_domain,

- viommu->last_domain, GFP_KERNEL);
-   if (ret >= 0)
-   vdomain->id = (unsigned int)ret;
+   vdomain->map_flags   = viommu->map_flags;
+   vdomain->viommu  = viommu;
  
-	return ret > 0 ? 0 : ret;

+   return 0;
  }
  
  static void viommu_domain_free(struct iommu_domain *domain)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-18 Thread Robin Murphy

On 2020-03-18 4:14 pm, Auger Eric wrote:

Hi,

On 3/18/20 1:00 PM, Robin Murphy wrote:

On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:

We don't currently support IOMMUs with a page granule larger than the
system page size. The IOVA allocator has a BUG_ON() in this case, and
VFIO has a WARN_ON().


Adding Alex in CC in case he has time to jump in. At the moment I don't
get why this WARN_ON() is here.

This was introduced in
c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow



It might be possible to remove these obstacles if necessary. If the host
uses 64kB pages and the guest uses 4kB, then a device driver calling
alloc_page() followed by dma_map_page() will create a 64kB mapping for a
4kB physical page, allowing the endpoint to access the neighbouring 60kB
of memory. This problem could be worked around with bounce buffers.


FWIW the fundamental issue is that callers of iommu_map() may expect to
be able to map two or more page-aligned regions directly adjacent to
each other for scatter-gather purposes (or ring buffer tricks), and
that's just not possible if the IOMMU granule is too big. Bounce
buffering would be a viable workaround for the streaming DMA API and
certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
GPUs, etc.)

Robin.


For the moment, rather than triggering the IOVA BUG_ON() on mismatched
page sizes, abort the virtio-iommu probe with an error message.


I understand this is a introduced as a temporary solution but this
sounds as an important limitation to me. For instance this will prevent
from running a fedora guest exposed with a virtio-iommu with a RHEL host.


As above, even if you bypassed all the warnings it wouldn't really work 
properly anyway. In all cases that wouldn't be considered broken, the 
underlying hardware IOMMUs should support the same set of granules as 
the CPUs (or at least the smallest one), so is it actually appropriate 
for RHEL to (presumably) expose a 64K granule in the first place, rather 
than "works with anything" 4K? And/or more generally is there perhaps a 
hole in the virtio-iommu spec WRT being able to negotiate page_size_mask 
for a particular granule if multiple options are available?


Robin.



Thanks

Eric


Reported-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
---
   drivers/iommu/virtio-iommu.c | 9 +
   1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 6d4e3c2a2ddb..80d5d8f621ab 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
   struct device *parent_dev = vdev->dev.parent;
   struct viommu_dev *viommu = NULL;
   struct device *dev = >dev;
+    unsigned long viommu_page_size;
   u64 input_start = 0;
   u64 input_end = -1UL;
   int ret;
@@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
*vdev)
   goto err_free_vqs;
   }
   +    viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
+    if (viommu_page_size > PAGE_SIZE) {
+    dev_err(dev, "granule 0x%lx larger than system page size
0x%lx\n",
+    viommu_page_size, PAGE_SIZE);
+    ret = -EINVAL;
+    goto err_free_vqs;
+    }
+
   viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
VIRTIO_IOMMU_MAP_F_WRITE;
   viommu->last_domain = ~0U;
  





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-18 Thread Robin Murphy

On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:

We don't currently support IOMMUs with a page granule larger than the
system page size. The IOVA allocator has a BUG_ON() in this case, and
VFIO has a WARN_ON().

It might be possible to remove these obstacles if necessary. If the host
uses 64kB pages and the guest uses 4kB, then a device driver calling
alloc_page() followed by dma_map_page() will create a 64kB mapping for a
4kB physical page, allowing the endpoint to access the neighbouring 60kB
of memory. This problem could be worked around with bounce buffers.


FWIW the fundamental issue is that callers of iommu_map() may expect to 
be able to map two or more page-aligned regions directly adjacent to 
each other for scatter-gather purposes (or ring buffer tricks), and 
that's just not possible if the IOMMU granule is too big. Bounce 
buffering would be a viable workaround for the streaming DMA API and 
certain similar use-cases, but not in general (e.g. coherent DMA, VFIO, 
GPUs, etc.)


Robin.


For the moment, rather than triggering the IOVA BUG_ON() on mismatched
page sizes, abort the virtio-iommu probe with an error message.

Reported-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/virtio-iommu.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 6d4e3c2a2ddb..80d5d8f621ab 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
struct device *parent_dev = vdev->dev.parent;
struct viommu_dev *viommu = NULL;
struct device *dev = >dev;
+   unsigned long viommu_page_size;
u64 input_start = 0;
u64 input_end = -1UL;
int ret;
@@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device *vdev)
goto err_free_vqs;
}
  
+	viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);

+   if (viommu_page_size > PAGE_SIZE) {
+   dev_err(dev, "granule 0x%lx larger than system page size 
0x%lx\n",
+   viommu_page_size, PAGE_SIZE);
+   ret = -EINVAL;
+   goto err_free_vqs;
+   }
+
viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
viommu->last_domain = ~0U;
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Robin Murphy

On 17/02/2020 1:31 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 01:22:44PM +, Robin Murphy wrote:

On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:

On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:

On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
drivers/iommu/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for
virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
right?

Robin.


In fact does not this work on any platform now?


There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
has been converted recently [1] but VT-d still implements its own DMA ops
(conversion patches are on the list [2]). On Arm the arch Kconfig selects
IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
complete. Until then I can add a "if X86" here for clarity.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
[2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/


What about others? E.g. PPC?


That was the point I was getting at - while iommu-dma should build just fine
for the likes of PPC, s390, 32-bit Arm, etc., they have no architecture code
to correctly wire up iommu_dma_ops to devices. Thus there's currently no
point pulling it in and pretending it's anything more than a waste of space
for architectures other than arm64 and x86. It's merely a historical
artefact of the x86 DMA API implementation that when the IOMMU drivers were
split out to form drivers/iommu they took some of their relevant arch code
with them.

Robin.



Rather than white-listing architectures, how about making the
architectures in question set some kind of symbol, and depend on it?


Umm, that's basically what we have already? Architectures that use 
iommu_dma_ops select IOMMU_DMA.


The only issue is the oddity of x86 treating IOMMU drivers as part of 
its arch code, which has never come up against a cross-architecture 
driver until now. Hence the options of either maintaining that paradigm 
and having the 'x86 arch code' aspect of this driver "select IOMMU_DMA 
if x86" such that it works out equivalent to AMD_IOMMU, or a more 
involved cleanup to move that responsibility out of 
drivers/iommu/Kconfig entirely and have arch/x86/Kconfig do something 
like "select IOMMU_DMA if IOMMU_API", as Jean suggested up-thread.


In the specific context of IOMMU_DMA we're not talking about any kind of 
white-list, merely a one-off special case for one particular architecture.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-17 Thread Robin Murphy

On 17/02/2020 1:01 pm, Michael S. Tsirkin wrote:

On Mon, Feb 17, 2020 at 10:01:07AM +0100, Jean-Philippe Brucker wrote:

On Sun, Feb 16, 2020 at 04:50:33AM -0500, Michael S. Tsirkin wrote:

On Fri, Feb 14, 2020 at 04:57:11PM +, Robin Murphy wrote:

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
   drivers/iommu/Kconfig | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
   config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for
virtio-iommu itself (and really shouldn't be), but is merely to satisfy the
x86 arch code's expectation that IOMMU drivers bring their own DMA ops,
right?

Robin.


In fact does not this work on any platform now?


There is ongoing work to use the generic IOMMU_DMA ops on X86. AMD IOMMU
has been converted recently [1] but VT-d still implements its own DMA ops
(conversion patches are on the list [2]). On Arm the arch Kconfig selects
IOMMU_DMA, and I assume we'll have the same on X86 once Tom's work is
complete. Until then I can add a "if X86" here for clarity.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20190613223901.9523-1-murph...@tcd.ie/
[2] https://lore.kernel.org/linux-iommu/20191221150402.13868-1-murph...@tcd.ie/


What about others? E.g. PPC?


That was the point I was getting at - while iommu-dma should build just 
fine for the likes of PPC, s390, 32-bit Arm, etc., they have no 
architecture code to correctly wire up iommu_dma_ops to devices. Thus 
there's currently no point pulling it in and pretending it's anything 
more than a waste of space for architectures other than arm64 and x86. 
It's merely a historical artefact of the x86 DMA API implementation that 
when the IOMMU drivers were split out to form drivers/iommu they took 
some of their relevant arch code with them.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] PCI: Add DMA configuration for virtual platforms

2020-02-14 Thread Robin Murphy

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

Hardware platforms usually describe the IOMMU topology using either
device-tree pointers or vendor-specific ACPI tables.  For virtual
platforms that don't provide a device-tree, the virtio-iommu device
contains a description of the endpoints it manages.  That information
allows us to probe endpoints after the IOMMU is probed (possibly as late
as userspace modprobe), provided it is discovered early enough.

Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the
endpoint is managed by a vIOMMU that will be loaded later, or 0 in any
other case to avoid disturbing the normal DMA configuration methods.
When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to
virt_dma_configure() is compiled out.

As long as the information is consistent, platforms can provide both a
device-tree and a built-in topology, and the IOMMU infrastructure is
able to deal with multiple DMA configuration methods.


Urgh, it's already been established[1] that having IOMMU setup tied to 
DMA configuration at driver probe time is not just conceptually wrong 
but actually broken, so the concept here worries me a bit. In a world 
where of_iommu_configure() and friends are being called much earlier 
around iommu_probe_device() time, how badly will this fall apart?


Robin.

[1] 
https://lore.kernel.org/linux-iommu/9625faf4-48ef-2dd3-d82f-931d9cf26...@huawei.com/



Signed-off-by: Jean-Philippe Brucker 
---
  drivers/pci/pci-driver.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 0454ca0e4e3f..69303a814f21 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "pci.h"
  #include "pcie/portdrv.h"
  
@@ -1602,6 +1603,10 @@ static int pci_dma_configure(struct device *dev)

struct device *bridge;
int ret = 0;
  
+	ret = virt_dma_configure(dev);

+   if (ret)
+   return ret;
+
bridge = pci_get_host_bridge_device(to_pci_dev(dev));
  
  	if (IS_ENABLED(CONFIG_OF) && bridge->parent &&



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2020-02-14 Thread Robin Murphy

On 14/02/2020 4:04 pm, Jean-Philippe Brucker wrote:

With the built-in topology description in place, x86 platforms can now
use the virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/Kconfig | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 068d4e0e3541..adcbda44d473 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -508,8 +508,9 @@ config HYPERV_IOMMU
  config VIRTIO_IOMMU
bool "Virtio IOMMU driver"
depends on VIRTIO=y
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA


Can that have an "if X86" for clarity? AIUI it's not necessary for 
virtio-iommu itself (and really shouldn't be), but is merely to satisfy 
the x86 arch code's expectation that IOMMU drivers bring their own DMA 
ops, right?


Robin.


select INTERVAL_TREE
help
  Para-virtualised IOMMU driver with virtio.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2019-12-23 Thread Robin Murphy

On 2019-12-23 10:37 am, Jani Nikula wrote:

On Sat, 21 Dec 2019, Tom Murphy  wrote:

This patchset converts the intel iommu driver to the dma-iommu api.

While converting the driver I exposed a bug in the intel i915 driver
which causes a huge amount of artifacts on the screen of my
laptop. You can see a picture of it here:
https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg

This issue is most likely in the i915 driver and is most likely caused
by the driver not respecting the return value of the
dma_map_ops::map_sg function. You can see the driver ignoring the
return value here:
https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51

Previously this didn’t cause issues because the intel map_sg always
returned the same number of elements as the input scatter gather list
but with the change to this dma-iommu api this is no longer the
case. I wasn’t able to track the bug down to a specific line of code
unfortunately.

Could someone from the intel team look at this?


Let me get this straight. There is current API that on success always
returns the same number of elements as the input scatter gather
list. You propose to change the API so that this is no longer the case?


No, the API for dma_map_sg() has always been that it may return fewer 
DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, 
the return value would surely be a simple success/fail condition). 
Relying on a particular implementation behaviour has never been strictly 
correct, even if it does happen to be a very common behaviour.



A quick check of various dma_map_sg() calls in the kernel seems to
indicate checking for 0 for errors and then ignoring the non-zero return
is a common pattern. Are you sure it's okay to make the change you're
proposing?


Various code uses tricks like just iterating the mapped list until the 
first segment with zero sg_dma_len(). Others may well simply have bugs.


Robin.


Anyway, due to the time of year and all, I'd like to ask you to file a
bug against i915 at [1] so this is not forgotten, and please let's not
merge the changes before this is resolved.


Thanks,
Jani.


[1] https://gitlab.freedesktop.org/drm/intel/issues/new



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] dma-mapping: Add dma_addr_is_phys_addr()

2019-10-14 Thread Robin Murphy

On 14/10/2019 05:51, David Gibson wrote:

On Fri, Oct 11, 2019 at 06:25:18PM -0700, Ram Pai wrote:

From: Thiago Jung Bauermann 

In order to safely use the DMA API, virtio needs to know whether DMA
addresses are in fact physical addresses and for that purpose,
dma_addr_is_phys_addr() is introduced.

cc: Benjamin Herrenschmidt 
cc: David Gibson 
cc: Michael Ellerman 
cc: Paul Mackerras 
cc: Michael Roth 
cc: Alexey Kardashevskiy 
cc: Paul Burton 
cc: Robin Murphy 
cc: Bartlomiej Zolnierkiewicz 
cc: Marek Szyprowski 
cc: Christoph Hellwig 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 


The change itself looks ok, so

Reviewed-by: David Gibson 

However, I would like to see the commit message (and maybe the inline
comments) expanded a bit on what the distinction here is about.  Some
of the text from the next patch would be suitable, about DMA addresses
usually being in a different address space but not in the case of
bounce buffering.


Right, this needs a much tighter definition. "DMA address happens to be 
a valid physical address" is true of various IOMMU setups too, but I 
can't believe it's meaningful in such cases.


If what you actually want is "DMA is direct or SWIOTLB" - i.e. "DMA 
address is physical address of DMA data (not necessarily the original 
buffer)" - wouldn't dma_is_direct() suffice?


Robin.


---
  arch/powerpc/include/asm/dma-mapping.h | 21 +
  arch/powerpc/platforms/pseries/Kconfig |  1 +
  include/linux/dma-mapping.h| 20 
  kernel/dma/Kconfig |  3 +++
  4 files changed, 45 insertions(+)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 565d6f7..f92c0a4b 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -5,6 +5,8 @@
  #ifndef _ASM_DMA_MAPPING_H
  #define _ASM_DMA_MAPPING_H
  
+#include 

+
  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
  {
/* We don't handle the NULL dev case for ISA for now. We could
@@ -15,4 +17,23 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
  }
  
+#ifdef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Secure guests always use the SWIOTLB, therefore DMA addresses are
+* actually the physical address of the bounce buffer.
+*/
+   return is_secure_guest();
+}
+#endif
+
  #endif/* _ASM_DMA_MAPPING_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9e35cdd..0108150 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -152,6 +152,7 @@ config PPC_SVM
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea..6df5664 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -693,6 +693,26 @@ static inline bool dma_addressing_limited(struct device 
*dev)
dma_get_required_mask(dev);
  }
  
+#ifndef CONFIG_ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+/**
+ * dma_addr_is_phys_addr - check whether a device DMA address is a physical
+ * address
+ * @dev:   device to check
+ *
+ * Returns %true if any DMA address for this device happens to also be a valid
+ * physical address (not necessarily of the same page).
+ */
+static inline bool dma_addr_is_phys_addr(struct device *dev)
+{
+   /*
+* Except in very specific setups, DMA addresses exist in a different
+* address space from CPU physical addresses and cannot be directly used
+* to reference system memory.
+*/
+   return false;
+}
+#endif
+
  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 9decbba..6209b46 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -51,6 +51,9 @@ config ARCH_HAS_DMA_MMAP_PGPROT
  config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool
  
+config ARCH_HAS_DMA_ADDR_IS_PHYS_ADDR

+   bool
+
  config DMA_NONCO

Re: [PATCH V5 4/5] iommu/dma-iommu: Use the dev->coherent_dma_mask

2019-08-19 Thread Robin Murphy

On 15/08/2019 12:09, Tom Murphy wrote:

Use the dev->coherent_dma_mask when allocating in the dma-iommu ops api.


Oops... I suppose technically that's my latent bug, but since we've all 
missed it so far, I doubt arm64 systems ever see any devices which 
actually have different masks.


Reviewed-by: Robin Murphy 


Signed-off-by: Tom Murphy 
---
  drivers/iommu/dma-iommu.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 906b7fa14d3c..b9a3ab02434b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -471,7 +471,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
  }
  
  static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,

-   size_t size, int prot)
+   size_t size, int prot, dma_addr_t dma_mask)
  {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -484,7 +484,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
  
  	size = iova_align(iovad, size + iova_off);
  
-	iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);

+   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
if (!iova)
return DMA_MAPPING_ERROR;
  
@@ -735,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,

int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
  
-	dma_handle = __iommu_dma_map(dev, phys, size, prot);

+   dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev));
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -938,7 +938,8 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
  {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+   dma_get_mask(dev));
  }
  
  static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,

@@ -1041,7 +1042,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!cpu_addr)
return NULL;
  
-	*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot);

+   *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
+   dev->coherent_dma_mask);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 3/5] iommu/dma-iommu: Handle deferred devices

2019-08-19 Thread Robin Murphy

On 15/08/2019 12:09, Tom Murphy wrote:

Handle devices which defer their attach to the iommu in the dma-iommu api


Other than nitpicking the name (I'd lean towards something like 
iommu_dma_deferred_attach),


Reviewed-by: Robin Murphy 


Signed-off-by: Tom Murphy 
---
  drivers/iommu/dma-iommu.c | 27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2712fbc68b28..906b7fa14d3c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  
  struct iommu_dma_msi_page {

struct list_headlist;
@@ -351,6 +352,21 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
  }
  
+static int handle_deferred_device(struct device *dev,

+   struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   if (!is_kdump_kernel())
+   return 0;
+
+   if (unlikely(ops->is_attach_deferred &&
+   ops->is_attach_deferred(domain, dev)))
+   return iommu_attach_device(domain, dev);
+
+   return 0;
+}
+
  /**
   * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
   *page flags.
@@ -463,6 +479,9 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t iova_off = iova_offset(iovad, phys);
dma_addr_t iova;
  
+	if (unlikely(handle_deferred_device(dev, domain)))

+   return DMA_MAPPING_ERROR;
+
size = iova_align(iovad, size + iova_off);
  
  	iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);

@@ -581,6 +600,9 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
  
  	*dma_handle = DMA_MAPPING_ERROR;
  
+	if (unlikely(handle_deferred_device(dev, domain)))

+   return NULL;
+
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
min_size = PAGE_SIZE;
@@ -713,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
  
-	dma_handle =__iommu_dma_map(dev, phys, size, prot);

+   dma_handle = __iommu_dma_map(dev, phys, size, prot);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(dev, phys, size, dir);
@@ -823,6 +845,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
unsigned long mask = dma_get_seg_boundary(dev);
int i;
  
+	if (unlikely(handle_deferred_device(dev, domain)))

+   return 0;
+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 2/5] iommu: Add gfp parameter to iommu_ops::map

2019-08-19 Thread Robin Murphy

On 15/08/2019 12:09, Tom Murphy wrote:

Add a gfp_t parameter to the iommu_ops::map function.
Remove the needless locking in the AMD iommu driver.

The iommu_ops::map function (or the iommu_map function which calls it)
was always supposed to be sleepable (according to Joerg's comment in
this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so
should probably have had a "might_sleep()" since it was written. However
currently the dma-iommu api can call iommu_map in an atomic context,
which it shouldn't do. This doesn't cause any problems because any iommu
driver which uses the dma-iommu api uses gfp_atomic in it's
iommu_ops::map function. But doing this wastes the memory allocators
atomic pools.


Looks reasonable to me - once we get the merges sorted out I'll take a 
look at propagating the flags through to io-pgtable for the SMMU drivers 
and friends.


Reviewed-by: Robin Murphy 


Signed-off-by: Tom Murphy 
---
  drivers/iommu/amd_iommu.c  |  3 ++-
  drivers/iommu/arm-smmu-v3.c|  2 +-
  drivers/iommu/arm-smmu.c   |  2 +-
  drivers/iommu/dma-iommu.c  |  6 ++---
  drivers/iommu/exynos-iommu.c   |  2 +-
  drivers/iommu/intel-iommu.c|  2 +-
  drivers/iommu/iommu.c  | 43 +-
  drivers/iommu/ipmmu-vmsa.c |  2 +-
  drivers/iommu/msm_iommu.c  |  2 +-
  drivers/iommu/mtk_iommu.c  |  2 +-
  drivers/iommu/mtk_iommu_v1.c   |  2 +-
  drivers/iommu/omap-iommu.c |  2 +-
  drivers/iommu/qcom_iommu.c |  2 +-
  drivers/iommu/rockchip-iommu.c |  2 +-
  drivers/iommu/s390-iommu.c |  2 +-
  drivers/iommu/tegra-gart.c |  2 +-
  drivers/iommu/tegra-smmu.c |  2 +-
  drivers/iommu/virtio-iommu.c   |  2 +-
  include/linux/iommu.h  | 21 -
  19 files changed, 77 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 1948be7ac8f8..0e53f9bd2be7 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3030,7 +3030,8 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
  }
  
  static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,

-phys_addr_t paddr, size_t page_size, int iommu_prot)
+phys_addr_t paddr, size_t page_size, int iommu_prot,
+gfp_t gfp)
  {
struct protection_domain *domain = to_pdomain(dom);
int prot = 0;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e7f49fd1a7ba..acc0eae7963f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1975,7 +1975,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
  }
  
  static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
  
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c

index aa06498f291d..05f42bdee494 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1284,7 +1284,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
  }
  
  static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

-   phys_addr_t paddr, size_t size, int prot)
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d991d40f797f..2712fbc68b28 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -469,7 +469,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
if (!iova)
return DMA_MAPPING_ERROR;
  
-	if (iommu_map(domain, iova, phys - iova_off, size, prot)) {

+   if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
return DMA_MAPPING_ERROR;
}
@@ -613,7 +613,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
  
-	if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)

+   if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
< size)
goto out_free_sg;
  
@@ -873,7 +873,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,

 * We'll leave any physical concatenation to the IOMMU driver's
 * implementation - it knows better than we do.
 */
-   if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
+   if (iommu_map_sg_atomic(d

Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-22 Thread Robin Murphy

On 22/07/2019 15:55, Eric Auger wrote:

Do not call dma_max_mapping_size for devices that have no DMA
mask set, otherwise we can hit a NULL pointer dereference.

This occurs when a virtio-blk-pci device is protected with
a virtual IOMMU.

Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
Signed-off-by: Eric Auger 
Suggested-by: Christoph Hellwig 
---
  drivers/virtio/virtio_ring.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4f5b55..37c143971211 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
  {
size_t max_segment_size = SIZE_MAX;
  
-	if (vring_use_dma_api(vdev))

+   if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)


Hmm, might it make sense to roll that check up into vring_use_dma_api() 
itself? After all, if the device has no mask then it's likely that other 
DMA API ops wouldn't really work as expected either.


Robin.


max_segment_size = dma_max_mapping_size(>dev);
  
  	return max_segment_size;



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-16 Thread Robin Murphy

On 14/01/2019 18:20, Michael S. Tsirkin wrote:

On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:


On 2019/1/14 下午5:50, Christoph Hellwig wrote:

On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:

On 2019/1/11 下午5:15, Joerg Roedel wrote:

On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:

Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
all virtio devices under AMD-SEV guests?

Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
aperture, because that memory is not encrypted. The guest bounces the
data then to its encrypted memory.

Regards,

Joerg


Thanks, have you tested vhost-net in this case. I suspect it may not work

Which brings me back to my pet pevee that we need to take actions
that virtio uses the proper dma mapping API by default with quirks
for legacy cases.  The magic bypass it uses is just causing problems
over problems.



Yes, I fully agree with you. This is probably an exact example of such
problem.

Thanks


I don't think so - the issue is really that DMA API does not yet handle
the SEV case 100% correctly. I suspect passthrough devices would have
the same issue.


Huh? Regardless of which virtio devices use it or not, the DMA API is 
handling the SEV case as correctly as it possibly can, by forcing 
everything through the unencrypted bounce buffer. If the segments being 
mapped are too big for that bounce buffer in the first place, there's 
nothing it can possibly do except fail, gracefully or otherwise.


Now, in theory, yes, the real issue at hand is not unique to virtio-blk 
nor SEV - any driver whose device has a sufficiently large DMA segment 
size and who manages to get sufficient physically-contiguous memory 
could technically generate a scatterlist segment longer than SWIOTLB can 
handle. However, in practice that basically never happens, not least 
because very few drivers ever override the default 64K DMA segment 
limit. AFAICS nothing in drivers/virtio is calling 
dma_set_max_seg_size() or otherwise assigning any dma_parms to replace 
the defaults either, so the really interesting question here is how are 
these apparently-out-of-spec 256K segments getting generated at all?


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-15 Thread Robin Murphy

On 2018-12-12 3:27 pm, Auger Eric wrote:

Hi,

On 12/12/18 3:56 PM, Michael S. Tsirkin wrote:

On Fri, Dec 07, 2018 at 06:52:31PM +, Jean-Philippe Brucker wrote:

Sorry for the delay, I wanted to do a little more performance analysis
before continuing.

On 27/11/2018 18:10, Michael S. Tsirkin wrote:

On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote:

+   if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) ||
+   !virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP))


Why bother with a feature bit for this then btw?


We'll need a new feature bit for sharing page tables with the hardware,
because they require different requests (attach_table/invalidate instead
of map/unmap.) A future device supporting page table sharing won't
necessarily need to support map/unmap.


I don't see virtio iommu being extended to support ARM specific
requests. This just won't scale, too many different
descriptor formats out there.


They aren't really ARM specific requests. The two new requests are
ATTACH_TABLE and INVALIDATE, which would be used by x86 IOMMUs as well.

Sharing CPU address space with the HW IOMMU (SVM) has been in the scope
of virtio-iommu since the first RFC, and I've been working with that
extension in mind since the beginning. As an example you can have a look
at my current draft for this [1], which is inspired from the VFIO work
we've been doing with Intel.

The negotiation phase inevitably requires vendor-specific fields in the
descriptors - host tells which formats are supported, guest chooses a
format and attaches page tables. But invalidation and fault reporting
descriptors are fairly generic.


We need to tread carefully here.  People expect it that if user does
lspci and sees a virtio device then it's reasonably portable.


If you want to go that way down the road, you should avoid
virtio iommu, instead emulate and share code with the ARM SMMU (probably
with a different vendor id so you can implement the
report on map for devices without PRI).


vSMMU has to stay in userspace though. The main reason we're proposing
virtio-iommu is that emulating every possible vIOMMU model in the kernel
would be unmaintainable. With virtio-iommu we can process the fast path
in the host kernel, through vhost-iommu, and do the heavy lifting in
userspace.


Interesting.


As said above, I'm trying to keep the fast path for
virtio-iommu generic.

More notes on what I consider to be the fast path, and comparison with
vSMMU:

(1) The primary use-case we have in mind for vIOMMU is something like
DPDK in the guest, assigning a hardware device to guest userspace. DPDK
maps a large amount of memory statically, to be used by a pass-through
device. For this case I don't think we care about vIOMMU performance.
Setup and teardown need to be reasonably fast, sure, but the MAP/UNMAP
requests don't have to be optimal.


(2) If the assigned device is owned by the guest kernel, then mappings
are dynamic and require dma_map/unmap() to be fast, but there generally
is no need for a vIOMMU, since device and drivers are trusted by the
guest kernel. Even when the user does enable a vIOMMU for this case
(allowing to over-commit guest memory, which needs to be pinned
otherwise),


BTW that's in theory in practice it doesn't really work.


we generally play tricks like lazy TLBI (non-strict mode) to
make it faster.


Simple lazy TLB for guest/userspace drivers would be a big no no.
You need something smarter.


Here device and drivers are trusted, therefore the
vulnerability window of lazy mode isn't a concern.

If the reason to enable the vIOMMU is over-comitting guest memory
however, you can't use nested translation because it requires pinning
the second-level tables. For this case performance matters a bit,
because your invalidate-on-map needs to be fast, even if you enable lazy
mode and only receive inval-on-unmap every 10ms. It won't ever be as
fast as nested translation, though. For this case I think vSMMU+Caching
Mode and userspace virtio-iommu with MAP/UNMAP would perform similarly
(given page-sized payloads), because the pagetable walk doesn't add a
lot of overhead compared to the context switch. But given the results
below, vhost-iommu would be faster than vSMMU+CM.


(3) Then there is SVM. For SVM, any destructive change to the process
address space requires a synchronous invalidation command to the
hardware (at least when using PCI ATS). Given that SVM is based on page
faults, fault reporting from host to guest also needs to be fast, as
well as fault response from guest to host.

I think this is where performance matters the most. To get a feel of the
advantage we get with virtio-iommu, I compared the vSMMU page-table
sharing implementation [2] and vhost-iommu + VFIO with page table
sharing (based on Tomasz Nowicki's vhost-iommu prototype). That's on a
ThunderX2 with a 10Gb NIC assigned to the guest kernel, which
corresponds to case (2) above, with nesting page tables and without the
lazy mode. The 

Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu

2018-10-18 Thread Robin Murphy

On 17/10/18 16:14, Michael S. Tsirkin wrote:

On Mon, Oct 15, 2018 at 08:46:41PM +0100, Jean-philippe Brucker wrote:

[Replying with my personal address because we're having SMTP issues]

On 15/10/2018 11:52, Michael S. Tsirkin wrote:

On Fri, Oct 12, 2018 at 02:41:59PM -0500, Bjorn Helgaas wrote:

s/iommu/IOMMU/ in subject

On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:

Using the iommu-map binding, endpoints in a given PCI domain can be
managed by different IOMMUs. Some virtual machines may allow a subset of
endpoints to bypass the IOMMU. In some case the IOMMU itself is presented


s/case/cases/


as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
PCI root complex has an iommu-map property, the driver requires all
endpoints to be described by the property. Allow the iommu-map property to
have gaps.


I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
safe to allow devices to bypass the IOMMU.  Does this mean a typo in
iommu-map could inadvertently allow devices to bypass it?



Thinking about this comment, I would like to ask: can't the
virtio device indicate the ranges in a portable way?
This would minimize the dependency on dt bindings and ACPI,
enabling support for systems that have neither but do
have virtio e.g. through pci.


I thought about adding a PROBE request for this in virtio-iommu, but it
wouldn't be usable by a Linux guest because of a bootstrapping problem.


Hmm. At some level it seems wrong to design hardware interfaces
around how Linux happens to probe things. That can change at any time
...


This isn't Linux-specific though. In general it's somewhere between 
difficult and impossible to pull in an IOMMU underneath a device after 
at device is active, so if any OS wants to use an IOMMU, it's going to 
want to know up-front that it's there and which devices it translates so 
that it can program said IOMMU appropriately *before* potentially 
starting DMA and/or interrupts from the relevant devices. Linux happens 
to do things in that order (either by firmware-driven probe-deferral or 
just perilous initcall ordering) because it is the only reasonable order 
in which to do them. AFAIK the platforms which don't rely on any 
firmware description of their IOMMU tend to have a fairly static system 
architecture (such that the OS simply makes hard-coded assumptions), so 
it's not necessarily entirely clear how they would cope with 
virtio-iommu either way.


Robin.


Early on, Linux needs a description of device dependencies, to determine
in which order to probe them. If the device dependency was described by
virtio-iommu itself, the guest could for example initialize a NIC,
allocate buffers and start DMA on the physical address space (which aborts
if the IOMMU implementation disallows DMA by default), only to find out
once the virtio-iommu module is loaded that it needs to cancel all DMA and
reconfigure the NIC. With a static description such as iommu-map in DT or
ACPI remapping tables, the guest can defer probing of the NIC until the
IOMMU is initialized.

Thanks,
Jean


Could you point me at the code you refer to here?


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu

2018-10-15 Thread Robin Murphy

On 12/10/18 20:41, Bjorn Helgaas wrote:

s/iommu/IOMMU/ in subject

On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:

Using the iommu-map binding, endpoints in a given PCI domain can be
managed by different IOMMUs. Some virtual machines may allow a subset of
endpoints to bypass the IOMMU. In some case the IOMMU itself is presented


s/case/cases/


as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
PCI root complex has an iommu-map property, the driver requires all
endpoints to be described by the property. Allow the iommu-map property to
have gaps.


I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
safe to allow devices to bypass the IOMMU.  Does this mean a typo in
iommu-map could inadvertently allow devices to bypass it?  Should we
indicate something in dmesg (and/or sysfs) about devices that bypass
it?


It's not really "allow devices to bypass the IOMMU" so much as "allow DT 
to describe devices which the IOMMU doesn't translate". It's a bit of an 
edge case for not-really-PCI devices, but FWIW I can certainly think of 
several ways to build real hardware like that. As for inadvertent errors 
leaving out IDs which *should* be in the map, that really depends on the 
IOMMU/driver implementation - e.g. SMMUv2 with arm-smmu.disable_bypass=0 
would treat the device as untranslated, whereas SMMUv3 would always 
generate a fault upon any transaction due to no valid stream table entry 
being programmed (not even a bypass one).


I reckon it's a sufficiently unusual case that keeping some sort of 
message probably is worthwhile (at pr_info rather than pr_err) in case 
someone does hit it by mistake.



Relaxing of_pci_map_rid also allows the msi-map property to have gaps,


At worst, I suppose we could always add yet another parameter for each 
caller to choose whether a missing entry is considered an error or not.


Robin.


s/of_pci_map_rid/of_pci_map_rid()/


which is invalid since MSIs always reach an MSI controller. Thankfully
Linux will error out later, when attempting to find an MSI domain for the
device.


Not clear to me what "error out" means here.  In a userspace program,
I would infer that the program exits with an error message, but I
doubt you mean that Linux exits.


Signed-off-by: Jean-Philippe Brucker 
---
  drivers/pci/of.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 1836b8ddf292..2f5015bdb256 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -451,9 +451,10 @@ int of_pci_map_rid(struct device_node *np, u32 rid,
return 0;
}
  
-	pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",

-   np, map_name, rid, target && *target ? *target : NULL);
-   return -EFAULT;
+   /* Bypasses translation */
+   if (id_out)
+   *id_out = rid;
+   return 0;
  }
  
  #if IS_ENABLED(CONFIG_OF_IRQ)

--
2.19.1


___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/5] dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu

2018-07-04 Thread Robin Murphy

On 27/06/18 18:46, Rob Herring wrote:

On Tue, Jun 26, 2018 at 11:59 AM Jean-Philippe Brucker
 wrote:


On 25/06/18 20:27, Rob Herring wrote:

On Thu, Jun 21, 2018 at 08:06:51PM +0100, Jean-Philippe Brucker wrote:

A virtio-mmio node may represent a virtio-iommu device. This is discovered
by the virtio driver at probe time, but the DMA topology isn't
discoverable and must be described by firmware. For DT the standard IOMMU
description is used, as specified in bindings/iommu/iommu.txt and
bindings/pci/pci-iommu.txt. Like many other IOMMUs, virtio-iommu
distinguishes masters by their endpoint IDs, which requires one IOMMU cell
in the "iommus" property.

Signed-off-by: Jean-Philippe Brucker 
---
  Documentation/devicetree/bindings/virtio/mmio.txt | 8 
  1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..337da0e3a87f 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -8,6 +8,14 @@ Required properties:
  - reg:  control registers base address and size including 
configuration space
  - interrupts:   interrupt generated by the device

+Required properties for virtio-iommu:
+
+- #iommu-cells: When the node describes a virtio-iommu device, it is
+linked to DMA masters using the "iommus" property as
+described in devicetree/bindings/iommu/iommu.txt. For
+virtio-iommu #iommu-cells must be 1, each cell describing
+a single endpoint ID.


The iommus property should also be documented for the client side.


Isn't section "IOMMU master node" of iommu.txt sufficient? Since the
iommus property applies to any DMA master, not only virtio-mmio devices,
the canonical description in iommu.txt seems the best place for it, and
I'm not sure what to add in this file. Maybe a short example below the
virtio_block one?


No, because somewhere we have to capture if 'iommus' is valid for
'virtio-mmio' or not. Hopefully soon we'll actually be able to
validate that.


Indeed, it's rather unusual to have a single compatible which may either 
be an IOMMU or an IOMMU client (but not both at once, I hope!), so 
nailing down the exact semantics as clearly as possible would definitely 
be desirable.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/4] iommu/virtio: Add probe request

2018-03-23 Thread Robin Murphy

On 14/02/18 14:53, Jean-Philippe Brucker wrote:

When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/virtio-iommu.c  | 163 --
  include/uapi/linux/virtio_iommu.h |  37 +
  2 files changed, 193 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index a9c9245e8ba2..3ac4b38eaf19 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -45,6 +45,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
  };
  
  struct viommu_mapping {

@@ -72,6 +73,7 @@ struct viommu_domain {
  struct viommu_endpoint {
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
  };
  
  struct viommu_request {

@@ -140,6 +142,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
case VIRTIO_IOMMU_T_UNMAP:
size = sizeof(r->unmap);
break;
+   case VIRTIO_IOMMU_T_PROBE:
+   *bottom += viommu->probe_size;
+   size = sizeof(r->probe) + *bottom;
+   break;
default:
return -EINVAL;
}
@@ -448,6 +454,105 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
  }
  
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,

+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   u64 addr = le64_to_cpu(mem->addr);
+   u64 size = le64_to_cpu(mem->size);
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(addr, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   default:
+   region = iommu_alloc_resv_region(addr, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   }
+
+   list_add(>resv_regions, >list);
+
+   /*
+* Treat unknown subtype as RESERVED, but urge users to update their
+* driver.
+*/
+   if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+   mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
+   pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);


Might as well avoid the extra comparisons by incorporating this into the 
switch statement, i.e.:


default:
dev_warn(vdev->viommu_dev->dev, ...);
/* Fallthrough */
case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
...

(dev_warn is generally preferable to pr_warn when feasible)


+
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   /* Trouble ahead. */
+   return -EINVAL;
+
+   probe = kzalloc(sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
len);
+   break;
+

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-23 Thread Robin Murphy

On 14/02/18 14:53, Jean-Philippe Brucker wrote:

The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
  MAINTAINERS   |   6 +
  drivers/iommu/Kconfig |  11 +
  drivers/iommu/Makefile|   1 +
  drivers/iommu/virtio-iommu.c  | 960 ++
  include/uapi/linux/virtio_ids.h   |   1 +
  include/uapi/linux/virtio_iommu.h | 116 +
  6 files changed, 1095 insertions(+)
  create mode 100644 drivers/iommu/virtio-iommu.c
  create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..2a181924d420 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14818,6 +14818,12 @@ S: Maintained
  F:drivers/virtio/virtio_input.c
  F:include/uapi/linux/virtio_input.h
  
+VIRTIO IOMMU DRIVER

+M: Jean-Philippe Brucker 
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
  VIRTUAL BOX GUEST DEVICE DRIVER
  M:Hans de Goede 
  M:Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f3a21343e636..1ea0ec74524f 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,4 +381,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
  
+config VIRTIO_IOMMU

+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
  endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 1fb695854809..9c68be1365e1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..a9c9245e8ba2
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,960 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0


This wants to be a // comment at the very top of the file (thankfully 
the policy is now properly documented in-tree since 
Documentation/process/license-rules.rst got merged)



+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   unsigned long   endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int 

Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-03-21 Thread Robin Murphy

On 21/03/18 13:14, Jean-Philippe Brucker wrote:

On 21/03/18 06:43, Tian, Kevin wrote:
[...]

+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10


this is ARM specific, and according to virtio-iommu spec isn't it
better probed on the endpoint instead of hard-coding here?


These values are arbitrary, not really ARM-specific even if ARM is the
only user yet: we're just reserving a random IOVA region for mapping MSIs.
It is hard-coded because of the way iommu-dma.c works, but I don't quite
remember why that allocation isn't dynamic.


The host kernel needs to have *some* MSI region in place before the 
guest can start configuring interrupts, otherwise it won't know what 
address to give to the underlying hardware. However, as soon as the host 
kernel has picked a region, host userspace needs to know that it can no 
longer use addresses in that region for DMA-able guest memory. It's a 
lot easier when the address is fixed in hardware and the host userspace 
will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in the 
more general case where MSI writes undergo IOMMU address translation so 
it's an arbitrary IOVA, this has the potential to conflict with stuff 
like guest memory hotplug.


What we currently have is just the simplest option, with the host kernel 
just picking something up-front and pretending to host userspace that 
it's a fixed hardware address. There's certainly scope for it to be a 
bit more dynamic in the sense of adding an interface to let userspace 
move it around (before attaching any devices, at least), but I don't 
think it's feasible for the host kernel to second-guess userspace enough 
to make it entirely transparent like it is in the DMA API domain case.


Of course, that's all assuming the host itself is using a virtio-iommu 
(e.g. in a nested virt or emulation scenario). When it's purely within a 
guest then an MSI reservation shouldn't matter so much, since the guest 
won't be anywhere near the real hardware configuration anyway.


Robin.


As said on the v0.6 spec thread, I'm not sure allocating the IOVA range in
the host is preferable. With nested translation the guest has to map it
anyway, and I believe dealing with IOVA allocation should be left to the
guest when possible.

Thanks,
Jean
___
iommu mailing list
io...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/4] vfio: Allow type-1 IOMMU instantiation with a virtio-iommu

2018-02-14 Thread Robin Murphy

On 14/02/18 15:26, Alex Williamson wrote:

On Wed, 14 Feb 2018 14:53:40 +
Jean-Philippe Brucker  wrote:


When enabling both VFIO and VIRTIO_IOMMU modules, automatically select
VFIO_IOMMU_TYPE1 as well.

Signed-off-by: Jean-Philippe Brucker 
---
  drivers/vfio/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..65a1e691110c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -21,7 +21,7 @@ config VFIO_VIRQFD
  menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
-   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3)
+   select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM_SMMU || ARM_SMMU_V3 || 
VIRTIO_IOMMU)
select ANON_INODES
help
  VFIO provides a framework for secure userspace device drivers.


Why are we basing this on specific IOMMU drivers in the first place?
Only ARM is doing that.  Shouldn't IOMMU_API only be enabled for ARM
targets that support it and therefore we can forget about the specific
IOMMU drivers?  Thanks,


Makes sense - the majority of ARM systems (and mobile/embedded ARM64 
ones) making use of IOMMU_API won't actually support VFIO, but it can't 
hurt to allow them to select the type 1 driver regardless. Especially as 
multiplatform configs are liable to be pulling in the SMMU driver(s) anyway.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/2] virtio: Make ARM SMMU workaround more specific

2017-02-02 Thread Robin Murphy
Whilst always using the DMA API is OK on ARM systems in most cases,
there can be a problem if a hypervisor fails to tell its guest that a
virtio device is cache-coherent. In that case, the guest will end up
making non-cacheable mappings for DMA buffers (i.e. the vring), which,
if the host is using a cacheable view of the same buffer on the other
end, is not a recipe for success.

It turns out that current kvmtool, and probably QEMU as well, runs into
this exact problem, and a guest using a virtio console can be seen to
hang pretty quickly after writing a few characters as host data in cache
and guest data directly in RAM go out of sync.

In order to fix this, narrow the scope of the original workaround from
all legacy devices to just those behind IOMMUs, which was really the
only thing we were trying to deal with in the first place.

Fixes: c7070619f340 ("vring: Force use of DMA API for ARM-based systems with 
legacy devices")
Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---
 drivers/virtio/virtio_ring.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 7e38ed79c3fc..03e824c77d61 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -117,6 +118,27 @@ struct vring_virtqueue {
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
 /*
+ * ARM Fast Models are hopefully unique in implementing "hardware" legacy
+ * virtio block devices, which can be placed behind a "real" IOMMU, but are
+ * unaware of VIRTIO_F_IOMMU_PLATFORM. Fortunately, we can detect whether
+ * an IOMMU is present and in use by checking whether an IOMMU driver has
+ * assigned the DMA master device a group.
+ */
+static bool vring_arm_legacy_dma_quirk(struct virtio_device *vdev)
+{
+   struct iommu_group *group;
+
+   if (!(IS_ENABLED(CONFIG_ARM) || IS_ENABLED(CONFIG_ARM64)) ||
+   virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
+   return false;
+
+   group = iommu_group_get(vdev->dev.parent);
+   iommu_group_put(group);
+
+   return group != NULL;
+}
+
+/*
  * Modern virtio devices have feature bits to specify whether they need a
  * quirk and bypass the IOMMU. If not there, just use the DMA API.
  *
@@ -159,12 +181,8 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
if (xen_domain())
return true;
 
-   /*
-* On ARM-based machines, the DMA ops will do the right thing,
-* so always use them with legacy devices.
-*/
-   if (IS_ENABLED(CONFIG_ARM) || IS_ENABLED(CONFIG_ARM64))
-   return !virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
+   if (vring_arm_legacy_dma_quirk(vdev))
+   return true;
 
return false;
 }
-- 
2.11.0.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/2] virtio: Document DMA coherency

2017-02-02 Thread Robin Murphy
Since making use of the DMA API will require the architecture code to
have the correct notion of device cache-coherency on architectures like
ARM, explicitly call this out in the virtio-mmio DT binding. The ship
has sailed for legacy virtio, but let's hope that we can head off any
future firmware mishaps.

Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---
 Documentation/devicetree/bindings/virtio/mmio.txt | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..999a93faa67c 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -7,6 +7,16 @@ Required properties:
 - compatible:  "virtio,mmio" compatibility string
 - reg: control registers base address and size including configuration 
space
 - interrupts:  interrupt generated by the device
+- dma-coherent:required if the device (or host emulation) accesses 
memory
+   cache-coherently, absent otherwise
+
+Linux implementation note:
+
+virtio devices not advertising the VIRTIO_F_IOMMU_PLATFORM flag have been
+implicitly assumed to be cache-coherent by Linux, and for legacy reasons this
+behaviour is likely to remain.  If VIRTIO_F_IOMMU_PLATFORM is advertised, then
+such assumptions cannot be relied upon and the "dma-coherent" property must
+accurately reflect the coherency of the device.
 
 Example:
 
@@ -14,4 +24,5 @@ Example:
compatible = "virtio,mmio";
reg = <0x3000 0x100>;
interrupts = <41>;
+   dma-coherent;
}
-- 
2.11.0.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   >