RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-06 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Friday, April 2, 2021 12:44 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index ee09323..cbcb434 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
> >  * removed to make room for superpage(s).
> >  * We're adding new large pages, so make sure
> >  * we don't remove their parent tables.
> > +*
> > +* We also need to flush the iotlb before 
> > creating
> > +* superpage to ensure it does not perserves any
> > +* obsolete info.
> >  */
> > -   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> > -  largepage_lvl + 1);
> > +   if (dma_pte_present(pte)) {
> 
> The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE 
> is
> insufficient. How about removing this check and always performing cache
> invalidation?
> 

Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) or 
NOT-present ( e.g. create a totally new superpage mapping ), but we only need 
to call free_pagetable and flush_iotlb in the former case, right ?

> > +   int i;
> > +
> > +   dma_pte_free_pagetable(domain, iov_pfn, 
> > end_pfn,
> > +  largepage_lvl + 
> > 1);
> > +   for_each_domain_iommu(i, domain)
> > +   
> > iommu_flush_iotlb_psi(g_iommus[i], domain,
> > + iov_pfn, 
> > nr_pages, 0, 0);
> > +
> 
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND] lib/scatterlist: Fix NULL pointer deference

2021-04-06 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, April 6, 2021 8:21 PM
> 
> On Tue, Apr 06, 2021 at 01:02:05AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, April 6, 2021 7:40 AM
> > >
> > > On Fri, Apr 02, 2021 at 07:58:02AM +, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe 
> > > > > Sent: Thursday, April 1, 2021 9:47 PM
> > > > >
> > > > > On Thu, Apr 01, 2021 at 01:43:36PM +, Liu, Yi L wrote:
> > > > > > > From: Jason Gunthorpe 
> > > > > > > Sent: Thursday, April 1, 2021 9:16 PM
> > > > > > >
> > > > > > > On Thu, Apr 01, 2021 at 01:10:48PM +, Liu, Yi L wrote:
> > > > > > > > > From: Jason Gunthorpe 
> > > > > > > > > Sent: Thursday, April 1, 2021 7:47 PM
> > > > > > > > [...]
> > > > > > > > > I'm worried Intel views the only use of PASID in a guest is 
> > > > > > > > > with
> > > > > > > > > ENQCMD, but that is not consistent with the industry. We need
> to
> > > see
> > > > > > > > > normal nested PASID support with assigned PCI VFs.
> > > > > > > >
> > > > > > > > I'm not quire flow here. Intel also allows PASID usage in guest
> > > without
> > > > > > > > ENQCMD. e.g. Passthru a PF to guest, and use PASID on it
> without
> > > > > > > ENQCMD.
> > > > > > >
> > > > > > > Then you need all the parts, the hypervisor calls from the vIOMMU,
> > > and
> > > > > > > you can't really use a vPASID.
> > > > > >
> > > > > > This is a diagram shows the vSVA setup.
> > > > >
> > > > > I'm not talking only about vSVA. Generic PASID support with arbitary
> > > > > mappings.
> > > > >
> > > > > And how do you deal with the vPASID vs pPASID issue if the system
> has
> > > > > a mix of physical devices and mdevs?
> > > > >
> > > >
> > > > We plan to support two schemes. One is vPASID identity-mapped to
> > > > pPASID then the mixed scenario just works, with the limitation of
> > > > lacking of live migration support. The other is non-identity-mapped
> > > > scheme, where live migration is supported but physical devices and
> > > > mdevs should not be mixed in one VM if both expose SVA capability
> > > > (requires some filtering check in Qemu).
> > >
> > > That just becomes "block vPASID support if any device that
> > > doesn't use ENQCMD is plugged into the guest"
> >
> > The limitation is only for physical device. and in reality it is not that
> > bad. To support live migration with physical device we anyway need
> > additional work to migrate the device state (e.g. based on Max's work),
> > then it's not unreasonable to also mediate guest programming of
> > device specific PASID register to enable vPASID (need to translate in
> > the whole VM lifespan but likely is not a hot path).
> 
> IMHO that is pretty unreasonable.. More likely we end up with vPASID
> tables in each migratable device like KVM has.

just like mdev needs to maintain allowed PASID list, this extends it to
all migratable devices.

> 
> > > Which needs a special VFIO capability of some kind so qemu knows to
> > > block it. This really needs to all be layed out together so someone
> > > can understand it :(
> >
> > Or could simply based on whether the VFIO device supports live migration.
> 
> You need to define affirmative caps that indicate that vPASID will be
> supported by the VFIO device.

Yes, this is required as I acked in another mail.

> 
> > > Why doesn't the siov cookbook explaining this stuff??
> > >
> > > > We hope the /dev/ioasid can support both schemes, with the minimal
> > > > requirement of allowing userspace to tag a vPASID to a pPASID and
> > > > allowing mdev to translate vPASID into pPASID, i.e. not assuming that
> > > > the guest will always use pPASID.
> > >
> > > What I'm a unclear of is if /dev/ioasid even needs to care about
> > > vPASID or if vPASID is just a hidden artifact of the KVM connection to
> > > setup the translation table and the vIOMMU driver in qemu.
> >
> > Not just for KVM. Also required by mdev, which needs to translate
> > vPASID into pPASID when ENQCMD is not used.
> 
> Do we have any mdev's that will do this?

definitely. Actually any mdev which doesn't do ENQCMD needs to do this.
In normal case, the PASID is programmed to a MMIO register (or in-memory
context) associate with the backend resource of the mdev. The value 
programmed from the guest is vPASID, thus must be translated into pPASID
before updating the physical register.

> 
> > should only care about the operations related to pPASID. VFIO could
> > carry vPASID information to mdev.
> 
> It depends how common this is, I suppose
> 

based on above I think it's a common case.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, April 6, 2021 8:35 PM
> 
> On Tue, Apr 06, 2021 at 01:27:15AM +, Tian, Kevin wrote:
> >
> > and here is one example why using existing VFIO/VDPA interface makes
> > sense. say dev1 (w/ sva) and dev2 (w/o sva) are placed in a single VFIO
> > container.
> 
> Forget about SVA, it is an irrelevant detail of how a PASID is
> configured.
> 
> > The container is associated to an iommu domain which contains a
> > single 2nd-level page table, shared by both devices (when attached
> > to the domain).
> 
> This level should be described by an ioasid.
> 
> > The VFIO MAP operation is applied to the 2nd-level
> > page table thus naturally applied to both devices. Then userspace
> > could use /dev/ioasid to further allocate IOASIDs and bind multiple
> > 1st-level page tables for dev1, nested on the shared 2nd-level page
> > table.
> 
> Because if you don't then we enter insane world where a PASID is being
> created under /dev/ioasid but its translation path flows through setup
> done by VFIO and the whole user API becomes an incomprehensible mess.
> 
> How will you even associate the PASID with the other translation??

PASID is attached to a specific iommu domain (created by VFIO/VDPA), which
has GPA->HPA mappings already configured. If we view that mapping as an
attribute of the iommu domain, it's reasonable to have the userspace-bound
pgtable through /dev/ioasid to nest on it.


> 
> The entire translation path for any ioasid or PASID should be defined
> only by /dev/ioasid. Everything else is a legacy API.
> 
> > If following your suggestion then VFIO must deny VFIO MAP operations
> > on sva1 (assume userspace should not mix sva1 and sva2 in the same
> > container and instead use /dev/ioasid to map for sva1)?
> 
> No, userspace creates an iosaid for the guest physical mapping and
> passes this ioasid to VFIO PCI which will assign it as the first layer
> mapping on the RID

Is it an dummy ioasid just for providing GPA mappings for nesting purpose
of other IOASIDs? Then we waste one per VM?

> 
> When PASIDs are allocated the uAPI will be told to logically nested
> under the first ioasid. When VFIO authorizes a PASID for a RID it
> checks that all the HW rules are being followed.

As I explained above, why cannot we just use iommu domain to connect 
the dots? Every passthrough framework needs to create an iommu domain
first. and It needs to support both devices w/ PASID and devices w/o PASID.
For devices w/o PASID it needs to invent its own MAP interface anyway.
Then why do we bother creating another MAP interface through /dev/ioasid
which not only duplicates but also creating transition burden between 
two set of MAP interfaces when the guest turns on/off the pasid capability
on the device?

> 
> If there are rules like groups of VFIO devices must always use the
> same IOASID then VFIO will check these too (and realistically qemu
> will have only one guest physical map ioasid anyhow)
> 
> There is no real difference between setting up an IOMMU table for a
> (RID,PASID) tuple or just a RID. We can do it universally with
> one interface for all consumers.
> 

'universally' upon from which angle you look at this problem. From IOASID
p.o.v possibly yes, but from device passthrough p.o.v. it's the opposite
since the passthrough framework needs to handle devices w/o PASID anyway
(or even for device w/ PASID it could send traffic w/o PASID) thus 'universally'
makes more sense if the passthrough framework can use one interface of its
own to manage GPA mappings for all consumers (apply to the case when a
PASID is allowed/authorized).

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device

2021-04-06 Thread Lu Baolu

Hi Jason,

On 4/7/21 4:00 AM, Jason Gunthorpe wrote:

On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote:

A parent device might create different types of mediated
devices. For example, a mediated device could be created
by the parent device with full isolation and protection
provided by the IOMMU. One usage case could be found on
Intel platforms where a mediated device is an assignable
subset of a PCI, the DMA requests on behalf of it are all
tagged with a PASID. Since IOMMU supports PASID-granular
translations (scalable mode in VT-d 3.0), this mediated
device could be individually protected and isolated by an
IOMMU.

This patch adds a new member in the struct mdev_device to
indicate that the mediated device represented by mdev could
be isolated and protected by attaching a domain to a device
represented by mdev->iommu_device. It also adds a helper to
add or set the iommu device.

* mdev_device->iommu_device
   - This, if set, indicates that the mediated device could
 be fully isolated and protected by IOMMU via attaching
 an iommu domain to this device. If empty, it indicates
 using vendor defined isolation, hence bypass IOMMU.

* mdev_set/get_iommu_device(dev, iommu_device)
   - Set or get the iommu device which represents this mdev
 in IOMMU's device scope. Drivers don't need to set the
 iommu device if it uses vendor defined isolation.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Suggested-by: Kevin Tian 
Suggested-by: Alex Williamson 
Signed-off-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
---
  drivers/vfio/mdev/mdev_core.c| 18 ++
  drivers/vfio/mdev/mdev_private.h |  1 +
  include/linux/mdev.h | 14 ++
  3 files changed, 33 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b96fedc77ee5..1b6435529166 100644
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool 
force_remove)
return 0;
  }
  
+int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)

+{
+   struct mdev_device *mdev = to_mdev_device(dev);
+
+   mdev->iommu_device = iommu_device;
+
+   return 0;
+}
+EXPORT_SYMBOL(mdev_set_iommu_device);


I was looking at these functions when touching the mdev stuff and I
have some concerns.

1) Please don't merge dead code. It is a year later and there is still
no in-tree user for any of this. This is not our process. Even
worse it was exported so it looks like this dead code is supporting
out of tree modules.

2) Why is this like this? Every struct device already has a connection
to the iommu layer and every mdev has a struct device all its own.

Why did we need to add special 'if (mdev)' stuff all over the
place? This smells like the same abuse Thomas
and I pointed out for the interrupt domains.


I've ever tried to implement a bus iommu_ops for mdev devices.

https://lore.kernel.org/lkml/20201030045809.957927-1-baolu...@linux.intel.com/

Any comments?

Best regards,
baolu



After my next series the mdev drivers will have direct access to
the vfio_device. So an alternative to using the struct device, or
adding 'if mdev' is to add an API to the vfio_device world to
inject what iommu configuration is needed from that direction
instead of trying to discover it from a struct device.

3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
it was not acceptable to do this for the interrupt side either.

4) It seems pretty clear to me this will be heavily impacted by the
/dev/ioasid discussion. Please consider removing the dead code now.

Basically, please fix this before trying to get idxd mdev merged as
the first user.

Jason


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Jason Wang


在 2021/4/6 下午8:42, Jason Gunthorpe 写道:

On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:


VFIO and VDPA has no buisness having map/unmap interfaces once we have
/dev/ioasid. That all belongs in the iosaid side.

I know they have those interfaces today, but that doesn't mean we have
to keep using them for PASID use cases, they should be replaced with a
'do dma from this pasid on /dev/ioasid' interface certainly not a
'here is a pasid from /dev/ioasid, go ahead and configure it youself'
interface
  
So it looks like the PASID was bound to SVA in this design. I think it's not

necessairly the case:

No, I wish people would stop talking about SVA.

SVA and vSVA are a very special narrow configuration of a PASID. There
are lots of other PASID configurations! That is the whole point, a
PASID is complicated, there are many configuration scenarios, they
need to be in one place with a very clearly defined uAPI



Right, that's my understanding as well.





1) PASID can be implemented without SVA, in this case a map/unmap interface
is still required

Any interface to manipulate a PASID should be under /dev/ioasid. We do
not want to duplicate this into every subsystem.



Yes.





2) For the case that hypervisor want to do some mediation in the middle for
a virtqueue. e.g in the case of control vq that is implemented in the
VF/ADI/SF itself, the hardware virtqueue needs to be controlled by Qemu,
Though binding qemu's page table to cvq can work but it looks like a
overkill, a small dedicated buffers that is mapped for this PASID seems more
suitalbe.

/dev/ioasid should allow userspace to setup any PASID configuration it
wants. There are many choices. That is the whole point, instead of
copying&pasting all the PASID configuration option into every
subsystem we have on place to configure it.

If you want a PASID (or generic ioasid) that has the guest physical
map, which is probably all that VDPA would ever want, then /dev/ioasid
should be able to prepare that.

If you just want to map a few buffers into a PASID then it should be
able to do that too.


So do you mean the device should not expose the PASID confiugration API to
guest? I think it could happen if we assign the whole device and let guest
to configure it for nested VMs.

This always needs co-operating with the vIOMMU driver. We can't have
nested PASID use without both parts working together.

The vIOMMU driver configures the PASID and assigns the mappings
(however complicated that turns out to be)

The VDPA/mdev driver authorizes the HW to use the ioasid mapping, eg
by authorizing a queue to issue PCIe TLPs with a specific PASID.

The authorization is triggered by the guest telling the vIOMMU to
allow a vRID to talk to a PASID, which qemu would have to translate to
telling something like the VDPA driver under the vRID that it can use
a PASID from /dev/ioasid

For security a VDPA/mdev device MUST NOT issue PASIDs that the vIOMMU
has not authorized its vRID to use. Otherwise the security model of
something like VFIO in the guest becomes completely broken.



Yes, that's how it should work.

Thanks




Jason



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5.4 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-06 Thread Lu Baolu

Hi Saeed,

On 4/7/21 12:35 AM, Saeed Mirzamohammadi wrote:

The IOMMU driver calculates the guest addressability for a DMA request
based on the value of the mgaw reported from the IOMMU. However, this
is a fused value and as mentioned in the spec, the guest width
should be calculated based on the supported adjusted guest address width
(SAGAW).

This is from specification:
"Guest addressability for a given DMA request is limited to the
minimum of the value reported through this field and the adjusted
guest address width of the corresponding page-table structure.
(Adjusted guest address widths supported by hardware are reported
through the SAGAW field)."

This causes domain initialization to fail and following
errors appear for EHCI PCI driver:

[2.486393] ehci-pci :01:00.4: EHCI Host Controller
[2.486624] ehci-pci :01:00.4: new USB bus registered, assigned bus
number 1
[2.489127] ehci-pci :01:00.4: DMAR: Allocating domain failed
[2.489350] ehci-pci :01:00.4: DMAR: 32bit DMA uses non-identity
mapping
[2.489359] ehci-pci :01:00.4: can't setup: -12
[2.489531] ehci-pci :01:00.4: USB bus 1 deregistered
[2.490023] ehci-pci :01:00.4: init :01:00.4 fail, -12
[2.490358] ehci-pci: probe of :01:00.4 failed with error -12

This issue happens when the value of the sagaw corresponds to a
48-bit agaw. This fix updates the calculation of the agaw based on
the IOMMU's sagaw value.

Signed-off-by: Saeed Mirzamohammadi 
Tested-by: Camille Lu 
---
  drivers/iommu/intel-iommu.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 953d86ca6d2b..396e14fad54b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1867,8 +1867,8 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
domain_reserve_special_ranges(domain);
  
  	/* calculate AGAW */

-   if (guest_width > cap_mgaw(iommu->cap))
-   guest_width = cap_mgaw(iommu->cap);
+   if (guest_width > agaw_to_width(iommu->agaw))
+   guest_width = agaw_to_width(iommu->agaw);


The spec requires to use a minimum of MGAW and AGAW, so why not,

cap_width = min_t(int, cap_mgaw(iommu->cap), 
agaw_to_width(iommu->agaw));
if (guest_width > cap_width)
guest_width = cap_width;

Best regards,
baolu


domain->gaw = guest_width;
adjust_width = guestwidth_to_adjustwidth(guest_width);
agaw = width_to_agaw(adjust_width);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 03/12] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-06 Thread isaacm

On 2021-04-06 04:57, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:03PM -0700, Isaac J. Manjarres wrote:

Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 2ed0c057d9e7..019149b204b8 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -143,6 +143,7 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU 
drivers.

  *
  * @map:  Map a physically contiguous memory region.
+ * @map_pages:Map a physically contiguous range of pages of the 
same size.

  * @unmap:Unmap a physically contiguous memory region.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the 
same size.

  * @iova_to_phys: Translate iova to physical address.
@@ -153,6 +154,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);


How about returning 'size_t' and using IS_ERR_VALUE() instead of adding
the extra 'mapped' argument (i.e. return the size of the region mapped
or an error code)? I don't think we realistically need to care about 
map

sizes that overlap with the error region.

I'd given that a shot before, but the problem that I kept running into 
was that
in case of an error, if I return an error code, I don't know how much 
memory
was mapped, so that I can invoke iommu_unmap from __iommu_map with that 
size to

undo the partial mappings from a map_pages() call.

Returning the amount of memory that was mapped in the case of an error 
will be
less than the size that was requested, but then we lose the information 
about why
the error happened, since the error code won't be returned, so that's 
why I went
with the current approach. Do you have any other ideas about how to 
handle this?


I'd thought of having arm_lpae_map_pages() invoke 
arm_lpae_unmap_pages(), but
the TLB maintenance was a problem, as we wouldn't invoke 
iommu_iotlb_sync().


Thanks,
Isaac

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-06 Thread isaacm

On 2021-04-06 05:15, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 124 
+++--

 1 file changed, 104 insertions(+), 20 deletions(-)


[...]

+static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, 
unsigned long iova,

+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *gather)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = &data->iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   long iaext = (s64)iova >> cfg->ias;
+   size_t unmapped = 0, unmapped_page;
+   int last_lvl;
+   size_t table_size, pages, tbl_offset, max_entries;
+
+	if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || 
!pgcount))

+   return 0;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return 0;
+
+   /*
+* Calculating the page table size here helps avoid situations where
+	 * a page range that is being unmapped may be mapped at the same 
level

+* but not mapped by the same tables. Allowing such a scenario to
+* occur can complicate the logic in arm_lpae_split_blk_unmap().
+*/
+   last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
+
+   if (last_lvl == data->start_level)
+   table_size = ARM_LPAE_PGD_SIZE(data);
+   else
+   table_size = ARM_LPAE_GRANULE(data);
+
+   max_entries = table_size / sizeof(*ptep);
+
+   while (pgcount) {
+   tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
+   pages = min_t(size_t, pgcount, max_entries - tbl_offset);
+   unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
+pages, data->start_level,
+ptep);
+   if (!unmapped_page)
+   break;
+
+   unmapped += unmapped_page;
+   iova += unmapped_page;
+   pgcount -= pages;
+   }


Robin has some comments on the first version of this patch, and I
don't think you
addressed them:

https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com

I'm inclined to agree that iterating here doesn't make a lot of sense 
--
we've already come back out of the page-table walk, so I think we 
should
just return to the caller (who is well prepared to handle a partial 
unmap).

Same for the map side of things.

If we get numbers showing that this is causing a performance issue, 
then

we should rework the page-table code to handle this at the lower level
(because I doubt the loop that you have is really worse than returning 
to

the caller anyway).


Sorry, I seem to have overlooked those comments.

I will go ahead and address them. I think it might be ideal to try to do
as much work as possible in the io-pgtable level, so as to minimize the 
number
of indirect calls incurred by jumping back and forth between iommu fwk, 
iommu

driver, and io-pgtable code.

Perhaps we can try something like how the linear mappings are created on 
arm64 i.e.
on the previous level, we can determine how many pages can be unmapped 
in one page table
in one iteration, and on the subsequent iterations, we can tackle 
another page table at
the lower level. Looking at the code, it doesn't seem too difficult to 
add this in. Thoughts?


Thanks,
Isaac


Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device

2021-04-06 Thread Jason Gunthorpe
On Mon, Mar 25, 2019 at 09:30:34AM +0800, Lu Baolu wrote:
> A parent device might create different types of mediated
> devices. For example, a mediated device could be created
> by the parent device with full isolation and protection
> provided by the IOMMU. One usage case could be found on
> Intel platforms where a mediated device is an assignable
> subset of a PCI, the DMA requests on behalf of it are all
> tagged with a PASID. Since IOMMU supports PASID-granular
> translations (scalable mode in VT-d 3.0), this mediated
> device could be individually protected and isolated by an
> IOMMU.
> 
> This patch adds a new member in the struct mdev_device to
> indicate that the mediated device represented by mdev could
> be isolated and protected by attaching a domain to a device
> represented by mdev->iommu_device. It also adds a helper to
> add or set the iommu device.
> 
> * mdev_device->iommu_device
>   - This, if set, indicates that the mediated device could
> be fully isolated and protected by IOMMU via attaching
> an iommu domain to this device. If empty, it indicates
> using vendor defined isolation, hence bypass IOMMU.
> 
> * mdev_set/get_iommu_device(dev, iommu_device)
>   - Set or get the iommu device which represents this mdev
> in IOMMU's device scope. Drivers don't need to set the
> iommu device if it uses vendor defined isolation.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Kevin Tian 
> Suggested-by: Alex Williamson 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jean-Philippe Brucker 
> --- 
>  drivers/vfio/mdev/mdev_core.c| 18 ++
>  drivers/vfio/mdev/mdev_private.h |  1 +
>  include/linux/mdev.h | 14 ++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index b96fedc77ee5..1b6435529166 100644
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -390,6 +390,24 @@ int mdev_device_remove(struct device *dev, bool 
> force_remove)
>   return 0;
>  }
>  
> +int mdev_set_iommu_device(struct device *dev, struct device *iommu_device)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + mdev->iommu_device = iommu_device;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(mdev_set_iommu_device);

I was looking at these functions when touching the mdev stuff and I
have some concerns.

1) Please don't merge dead code. It is a year later and there is still
   no in-tree user for any of this. This is not our process. Even
   worse it was exported so it looks like this dead code is supporting
   out of tree modules.

2) Why is this like this? Every struct device already has a connection
   to the iommu layer and every mdev has a struct device all its own.

   Why did we need to add special 'if (mdev)' stuff all over the
   place? This smells like the same abuse Thomas
   and I pointed out for the interrupt domains.

   After my next series the mdev drivers will have direct access to
   the vfio_device. So an alternative to using the struct device, or
   adding 'if mdev' is to add an API to the vfio_device world to
   inject what iommu configuration is needed from that direction
   instead of trying to discover it from a struct device.

3) The vfio_bus_is_mdev() and related symbol_get() nonsense in
   drivers/vfio/vfio_iommu_type1.c has to go, for the same reasons
   it was not acceptable to do this for the interrupt side either.

4) It seems pretty clear to me this will be heavily impacted by the
   /dev/ioasid discussion. Please consider removing the dead code now.

Basically, please fix this before trying to get idxd mdev merged as
the first user.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 06/12] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-06 Thread isaacm

On 2021-04-06 04:53, Will Deacon wrote:

On Mon, Apr 05, 2021 at 12:11:06PM -0700, Isaac J. Manjarres wrote:

From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this 
address

is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9006397b6604..a3bbf7e310b0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct 
iommu_domain *domain, dma_addr_t iova)

 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);

-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long 
iova,

+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   phys_addr_t addr_merge = paddr | iova;


Huh, so this was 'unsigned long' before and, given that the 
pgsize_bitmap

on the domain is also unsigned long, then I think that's fine. So using
that would mean you don't need GENMASK_ULL for this guy either.

Will

Thanks, I will address this in version 4 of the series.

--Isaac
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] iommu: Move IOVA power-of-2 roundup into allocator

2021-04-06 Thread John Garry

So then we have the issue of how to dynamically increase this rcache
threshold. The problem is that we may have many devices associated with
the same domain. So, in theory, we can't assume that when we increase
the threshold that some other device will try to fast free an IOVA 
which

was allocated prior to the increase and was not rounded up.

I'm very open to better (or less bad) suggestions on how to do this ...

...but yes, regardless of exactly where it happens, rounding up or not
is the problem for rcaches in general. I've said several times that my
preferred approach is to not change it that dynamically at all, but
instead treat it more like we treat the default domain type.



Can you remind me of that idea? I don't remember you mentioning using 
default domain handling as a reference in any context.




Hi Robin,

Sorry if the phrasing was unclear there - the allusion to default 
domains is new, it just occurred to me that what we do there is in fact 
fairly close to what I've suggested previously for this. In that case, 
we have a global policy set by the command line, which *can* be 
overridden per-domain via sysfs at runtime, provided the user is willing 
to tear the whole thing down. Using a similar approach here would give a 
fair degree of flexibility but still mean that changes never have to be 
made dynamically to a live domain.


So are you saying that we can handle it similar to how we now can handle 
changing default domain for an IOMMU group via sysfs? If so, that just 
is not practical here. Reason being that this particular DMA engine 
provides the block device giving / mount point, so if we unbind the 
driver, we lose / mount point.


And I am not sure if the end user would even know how to set such a 
tunable. Or, in this case, why the end user would not want the optimized 
range configured always.


I'd still rather if the device driver could provide info which can be 
used to configure this before or during probing.


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5.4 1/1] iommu/vt-d: Fix agaw for a supported 48 bit guest address width

2021-04-06 Thread Saeed Mirzamohammadi
The IOMMU driver calculates the guest addressability for a DMA request
based on the value of the mgaw reported from the IOMMU. However, this
is a fused value and as mentioned in the spec, the guest width
should be calculated based on the supported adjusted guest address width
(SAGAW).

This is from specification:
"Guest addressability for a given DMA request is limited to the
minimum of the value reported through this field and the adjusted
guest address width of the corresponding page-table structure.
(Adjusted guest address widths supported by hardware are reported
through the SAGAW field)."

This causes domain initialization to fail and following
errors appear for EHCI PCI driver:

[2.486393] ehci-pci :01:00.4: EHCI Host Controller
[2.486624] ehci-pci :01:00.4: new USB bus registered, assigned bus
number 1
[2.489127] ehci-pci :01:00.4: DMAR: Allocating domain failed
[2.489350] ehci-pci :01:00.4: DMAR: 32bit DMA uses non-identity
mapping
[2.489359] ehci-pci :01:00.4: can't setup: -12
[2.489531] ehci-pci :01:00.4: USB bus 1 deregistered
[2.490023] ehci-pci :01:00.4: init :01:00.4 fail, -12
[2.490358] ehci-pci: probe of :01:00.4 failed with error -12

This issue happens when the value of the sagaw corresponds to a
48-bit agaw. This fix updates the calculation of the agaw based on
the IOMMU's sagaw value.

Signed-off-by: Saeed Mirzamohammadi 
Tested-by: Camille Lu 
---
 drivers/iommu/intel-iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 953d86ca6d2b..396e14fad54b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1867,8 +1867,8 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
domain_reserve_special_ranges(domain);
 
/* calculate AGAW */
-   if (guest_width > cap_mgaw(iommu->cap))
-   guest_width = cap_mgaw(iommu->cap);
+   if (guest_width > agaw_to_width(iommu->agaw))
+   guest_width = agaw_to_width(iommu->agaw);
domain->gaw = guest_width;
adjust_width = guestwidth_to_adjustwidth(guest_width);
agaw = width_to_agaw(adjust_width);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND] lib/scatterlist: Fix NULL pointer deference

2021-04-06 Thread Ricardo Ribalda
When sg_alloc_table_from_pages is called with n_pages = 0, we write in a
non-allocated page. Fix it by checking early the error condition.

[7.666801] BUG: kernel NULL pointer dereference, address: 0010
[7.667487] #PF: supervisor read access in kernel mode
[7.667970] #PF: error_code(0x) - not-present page
[7.668448] PGD 0 P4D 0
[7.668690] Oops:  [#1] SMP NOPTI
[7.669037] CPU: 0 PID: 184 Comm: modprobe Not tainted 5.11.0+ #2
[7.669606] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.14.0-2 04/01/2014
[7.670378] RIP: 0010:__sg_alloc_table_from_pages+0x2c5/0x4a0
[7.670924] Code: c9 01 48 c7 40 08 00 00 00 00 48 89 08 8b 47 0c 41 8d 44 
00 ff 89 47 0c 48 81 fa 00 f0 ff ff 0f 87 d4 01 00 00 49 8b 16 89 d8 <4a> 8b 74 
fd 00 4c 89 d1 44 29 f8 c1 e0 0c 44 29 d8 4c 39 d0 48 0f
[7.672643] RSP: 0018:ba1e8028fb30 EFLAGS: 00010287
[7.673133] RAX: 0001 RBX: 0001 RCX: 0002
[7.673791] RDX: 0002 RSI: ada6d0ba RDI: 9afe01fff820
[7.674448] RBP: 0010 R08: 0001 R09: 0001
[7.675100] R10:  R11:  R12: 
[7.675754] R13: f000 R14: 9afe01fff800 R15: 
[7.676409] FS:  7fb0f448f540() GS:9afe07a0() 
knlGS:
[7.677151] CS:  0010 DS:  ES:  CR0: 80050033
[7.677681] CR2: 0010 CR3: 02184001 CR4: 00370ef0
[7.678342] DR0:  DR1:  DR2: 
[7.679019] DR3:  DR6: fffe0ff0 DR7: 0400
[7.680349] Call Trace:
[7.680605]  ? device_add+0x146/0x810
[7.681021]  sg_alloc_table_from_pages+0x11/0x30
[7.681511]  vb2_dma_sg_alloc+0x162/0x280 [videobuf2_dma_sg]

Cc: sta...@vger.kernel.org
Fixes: efc42bc98058 ("scatterlist: add sg_alloc_table_from_pages function")
Signed-off-by: Ricardo Ribalda 
---
 lib/scatterlist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index a59778946404..1e83b6a3d930 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -435,6 +435,9 @@ struct scatterlist *__sg_alloc_table_from_pages(struct 
sg_table *sgt,
unsigned int added_nents = 0;
struct scatterlist *s = prv;
 
+   if (n_pages == 0)
+   return ERR_PTR(-EINVAL);
+
/*
 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
 * otherwise it can overshoot.
-- 
2.31.0.208.g409f899ff0-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 09/10] ACPI/IORT: Enable stall support for platform devices

2021-04-06 Thread Lorenzo Pieralisi
On Thu, Apr 01, 2021 at 05:47:18PM +0200, Jean-Philippe Brucker wrote:
> Copy the "Stall supported" bit, that tells whether a named component
> supports stall, into the dma-can-stall device property.
> 
> Acked-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/arm64/iort.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Acked-by: Lorenzo Pieralisi 

> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 3912a1f6058e..0828f70cb782 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -968,13 +968,15 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, 
> u16 alias, void *data)
>  static void iort_named_component_init(struct device *dev,
> struct acpi_iort_node *node)
>  {
> - struct property_entry props[2] = {};
> + struct property_entry props[3] = {};
>   struct acpi_iort_named_component *nc;
>  
>   nc = (struct acpi_iort_named_component *)node->node_data;
>   props[0] = PROPERTY_ENTRY_U32("pasid-num-bits",
> FIELD_GET(ACPI_IORT_NC_PASID_BITS,
>   nc->node_flags));
> + if (nc->node_flags & ACPI_IORT_NC_STALL_SUPPORTED)
> + props[1] = PROPERTY_ENTRY_BOOL("dma-can-stall");
>  
>   if (device_add_properties(dev, props))
>   dev_warn(dev, "Could not add device properties\n");
> -- 
> 2.31.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Jason Gunthorpe
On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:

> > VFIO and VDPA has no buisness having map/unmap interfaces once we have
> > /dev/ioasid. That all belongs in the iosaid side.
> > 
> > I know they have those interfaces today, but that doesn't mean we have
> > to keep using them for PASID use cases, they should be replaced with a
> > 'do dma from this pasid on /dev/ioasid' interface certainly not a
> > 'here is a pasid from /dev/ioasid, go ahead and configure it youself'
> > interface
>  
> So it looks like the PASID was bound to SVA in this design. I think it's not
> necessairly the case:

No, I wish people would stop talking about SVA.

SVA and vSVA are a very special narrow configuration of a PASID. There
are lots of other PASID configurations! That is the whole point, a
PASID is complicated, there are many configuration scenarios, they
need to be in one place with a very clearly defined uAPI

> 1) PASID can be implemented without SVA, in this case a map/unmap interface
> is still required

Any interface to manipulate a PASID should be under /dev/ioasid. We do
not want to duplicate this into every subsystem.

> 2) For the case that hypervisor want to do some mediation in the middle for
> a virtqueue. e.g in the case of control vq that is implemented in the
> VF/ADI/SF itself, the hardware virtqueue needs to be controlled by Qemu,
> Though binding qemu's page table to cvq can work but it looks like a
> overkill, a small dedicated buffers that is mapped for this PASID seems more
> suitalbe.

/dev/ioasid should allow userspace to setup any PASID configuration it
wants. There are many choices. That is the whole point, instead of
copying&pasting all the PASID configuration option into every
subsystem we have on place to configure it.

If you want a PASID (or generic ioasid) that has the guest physical
map, which is probably all that VDPA would ever want, then /dev/ioasid
should be able to prepare that.

If you just want to map a few buffers into a PASID then it should be
able to do that too.

> So do you mean the device should not expose the PASID confiugration API to
> guest? I think it could happen if we assign the whole device and let guest
> to configure it for nested VMs.

This always needs co-operating with the vIOMMU driver. We can't have
nested PASID use without both parts working together.

The vIOMMU driver configures the PASID and assigns the mappings
(however complicated that turns out to be)

The VDPA/mdev driver authorizes the HW to use the ioasid mapping, eg
by authorizing a queue to issue PCIe TLPs with a specific PASID.

The authorization is triggered by the guest telling the vIOMMU to
allow a vRID to talk to a PASID, which qemu would have to translate to
telling something like the VDPA driver under the vRID that it can use
a PASID from /dev/ioasid

For security a VDPA/mdev device MUST NOT issue PASIDs that the vIOMMU
has not authorized its vRID to use. Otherwise the security model of
something like VFIO in the guest becomes completely broken.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Jason Gunthorpe
On Tue, Apr 06, 2021 at 01:27:15AM +, Tian, Kevin wrote:
> 
> and here is one example why using existing VFIO/VDPA interface makes
> sense. say dev1 (w/ sva) and dev2 (w/o sva) are placed in a single VFIO 
> container. 

Forget about SVA, it is an irrelevant detail of how a PASID is
configured.

> The container is associated to an iommu domain which contains a
> single 2nd-level page table, shared by both devices (when attached
> to the domain).

This level should be described by an ioasid.

> The VFIO MAP operation is applied to the 2nd-level
> page table thus naturally applied to both devices. Then userspace
> could use /dev/ioasid to further allocate IOASIDs and bind multiple
> 1st-level page tables for dev1, nested on the shared 2nd-level page
> table.

Because if you don't then we enter insane world where a PASID is being
created under /dev/ioasid but its translation path flows through setup
done by VFIO and the whole user API becomes an incomprehensible mess.

How will you even associate the PASID with the other translation??

The entire translation path for any ioasid or PASID should be defined
only by /dev/ioasid. Everything else is a legacy API.

> If following your suggestion then VFIO must deny VFIO MAP operations
> on sva1 (assume userspace should not mix sva1 and sva2 in the same
> container and instead use /dev/ioasid to map for sva1)? 

No, userspace creates an iosaid for the guest physical mapping and
passes this ioasid to VFIO PCI which will assign it as the first layer
mapping on the RID

When PASIDs are allocated the uAPI will be told to logically nested
under the first ioasid. When VFIO authorizes a PASID for a RID it
checks that all the HW rules are being followed.

If there are rules like groups of VFIO devices must always use the
same IOASID then VFIO will check these too (and realistically qemu
will have only one guest physical map ioasid anyhow)

There is no real difference between setting up an IOMMU table for a
(RID,PASID) tuple or just a RID. We can do it universally with
one interface for all consumers.

I wanted this when we were doing VDPA for the first time, now that we
are doing pasid and more difficult stuff I view it as essential.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Jason Gunthorpe
On Tue, Apr 06, 2021 at 01:02:05AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, April 6, 2021 7:40 AM
> > 
> > On Fri, Apr 02, 2021 at 07:58:02AM +, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe 
> > > > Sent: Thursday, April 1, 2021 9:47 PM
> > > >
> > > > On Thu, Apr 01, 2021 at 01:43:36PM +, Liu, Yi L wrote:
> > > > > > From: Jason Gunthorpe 
> > > > > > Sent: Thursday, April 1, 2021 9:16 PM
> > > > > >
> > > > > > On Thu, Apr 01, 2021 at 01:10:48PM +, Liu, Yi L wrote:
> > > > > > > > From: Jason Gunthorpe 
> > > > > > > > Sent: Thursday, April 1, 2021 7:47 PM
> > > > > > > [...]
> > > > > > > > I'm worried Intel views the only use of PASID in a guest is with
> > > > > > > > ENQCMD, but that is not consistent with the industry. We need to
> > see
> > > > > > > > normal nested PASID support with assigned PCI VFs.
> > > > > > >
> > > > > > > I'm not quire flow here. Intel also allows PASID usage in guest
> > without
> > > > > > > ENQCMD. e.g. Passthru a PF to guest, and use PASID on it without
> > > > > > ENQCMD.
> > > > > >
> > > > > > Then you need all the parts, the hypervisor calls from the vIOMMU,
> > and
> > > > > > you can't really use a vPASID.
> > > > >
> > > > > This is a diagram shows the vSVA setup.
> > > >
> > > > I'm not talking only about vSVA. Generic PASID support with arbitary
> > > > mappings.
> > > >
> > > > And how do you deal with the vPASID vs pPASID issue if the system has
> > > > a mix of physical devices and mdevs?
> > > >
> > >
> > > We plan to support two schemes. One is vPASID identity-mapped to
> > > pPASID then the mixed scenario just works, with the limitation of
> > > lacking of live migration support. The other is non-identity-mapped
> > > scheme, where live migration is supported but physical devices and
> > > mdevs should not be mixed in one VM if both expose SVA capability
> > > (requires some filtering check in Qemu).
> > 
> > That just becomes "block vPASID support if any device that
> > doesn't use ENQCMD is plugged into the guest"
> 
> The limitation is only for physical device. and in reality it is not that
> bad. To support live migration with physical device we anyway need 
> additional work to migrate the device state (e.g. based on Max's work), 
> then it's not unreasonable to also mediate guest programming of 
> device specific PASID register to enable vPASID (need to translate in
> the whole VM lifespan but likely is not a hot path).

IMHO that is pretty unreasonable.. More likely we end up with vPASID
tables in each migratable device like KVM has.

> > Which needs a special VFIO capability of some kind so qemu knows to
> > block it. This really needs to all be layed out together so someone
> > can understand it :(
> 
> Or could simply based on whether the VFIO device supports live migration.

You need to define affirmative caps that indicate that vPASID will be
supported by the VFIO device.

> > Why doesn't the siov cookbook explaining this stuff??
> > 
> > > We hope the /dev/ioasid can support both schemes, with the minimal
> > > requirement of allowing userspace to tag a vPASID to a pPASID and
> > > allowing mdev to translate vPASID into pPASID, i.e. not assuming that
> > > the guest will always use pPASID.
> > 
> > What I'm a unclear of is if /dev/ioasid even needs to care about
> > vPASID or if vPASID is just a hidden artifact of the KVM connection to
> > setup the translation table and the vIOMMU driver in qemu.
> 
> Not just for KVM. Also required by mdev, which needs to translate
> vPASID into pPASID when ENQCMD is not used.

Do we have any mdev's that will do this?

> should only care about the operations related to pPASID. VFIO could
> carry vPASID information to mdev.

It depends how common this is, I suppose

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 11/12] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:11PM -0700, Isaac J. Manjarres wrote:
> Implement the unmap_pages() callback for the ARM SMMU driver
> to allow calls from iommu_unmap to unmap multiple pages of
> the same size in one call.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Will Deacon 
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index d8c6bfde6a61..f29f1fb109f8 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1225,6 +1225,24 @@ static size_t arm_smmu_unmap(struct iommu_domain 
> *domain, unsigned long iova,
>   return ret;
>  }
>  
> +static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned 
> long iova,
> +size_t pgsize, size_t pgcount,
> +struct iommu_iotlb_gather *iotlb_gather)
> +{
> + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
> + size_t ret;
> +
> + if (!ops)
> + return 0;
> +
> + arm_smmu_rpm_get(smmu);
> + ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
> + arm_smmu_rpm_put(smmu);
> +
> + return ret;
> +}

Doesn't this go wrong if we're using the short-descriptor page-table
format? (same for the next patch)

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 09/12] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:09PM -0700, Isaac J. Manjarres wrote:
> Implement the unmap_pages() callback for the ARM LPAE io-pgtable
> format.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Will Deacon 
> ---
>  drivers/iommu/io-pgtable-arm.c | 124 +++--
>  1 file changed, 104 insertions(+), 20 deletions(-)

[...]

> +static size_t arm_lpae_unmap_pages(struct io_pgtable_ops *ops, unsigned long 
> iova,
> +size_t pgsize, size_t pgcount,
> +struct iommu_iotlb_gather *gather)
> +{
> + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> + struct io_pgtable_cfg *cfg = &data->iop.cfg;
> + arm_lpae_iopte *ptep = data->pgd;
> + long iaext = (s64)iova >> cfg->ias;
> + size_t unmapped = 0, unmapped_page;
> + int last_lvl;
> + size_t table_size, pages, tbl_offset, max_entries;
> +
> + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize || 
> !pgcount))
> + return 0;
> +
> + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> + iaext = ~iaext;
> + if (WARN_ON(iaext))
> + return 0;
> +
> + /*
> +  * Calculating the page table size here helps avoid situations where
> +  * a page range that is being unmapped may be mapped at the same level
> +  * but not mapped by the same tables. Allowing such a scenario to
> +  * occur can complicate the logic in arm_lpae_split_blk_unmap().
> +  */
> + last_lvl = ARM_LPAE_BLOCK_SIZE_LVL(pgsize, data);
> +
> + if (last_lvl == data->start_level)
> + table_size = ARM_LPAE_PGD_SIZE(data);
> + else
> + table_size = ARM_LPAE_GRANULE(data);
> +
> + max_entries = table_size / sizeof(*ptep);
> +
> + while (pgcount) {
> + tbl_offset = ARM_LPAE_LVL_IDX(iova, last_lvl, data);
> + pages = min_t(size_t, pgcount, max_entries - tbl_offset);
> + unmapped_page = __arm_lpae_unmap(data, gather, iova, pgsize,
> +  pages, data->start_level,
> +  ptep);
> + if (!unmapped_page)
> + break;
> +
> + unmapped += unmapped_page;
> + iova += unmapped_page;
> + pgcount -= pages;
> + }

Robin has some comments on the first version of this patch, and I don't think 
you
addressed them:

https://lore.kernel.org/r/b93fa0b1-e2a4-1aad-8b88-4d0dfecdf...@arm.com

I'm inclined to agree that iterating here doesn't make a lot of sense --
we've already come back out of the page-table walk, so I think we should
just return to the caller (who is well prepared to handle a partial unmap).
Same for the map side of things.

If we get numbers showing that this is causing a performance issue, then
we should rework the page-table code to handle this at the lower level
(because I doubt the loop that you have is really worse than returning to
the caller anyway).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-06 Thread Jason Gunthorpe
On Tue, Apr 06, 2021 at 12:37:35AM +, Tian, Kevin wrote:

> With nested translation it is GVA->GPA->HPA. The kernel needs to
> fix fault related to GPA->HPA (managed by VFIO/VDPA) while 
> handle_mm_fault only handles HVA->HPA. In this case, the 2nd-level
> page fault is expected to be delivered to VFIO/VDPA first which then
> find HVA related to GPA, call handle_mm_fault to fix HVA->HPA,
> and then call iommu_map to fix GPA->HPA in the IOMMU page table.
> This is exactly like how CPU EPT violation is handled.

No, it should all be in the /dev/ioasid layer not duplicated into
every user.

> > If the fault needs to be fixed in the guest, then it needs to be
> > delivered over /dev/ioasid in some way and injected into the
> > vIOMMU. VFIO and VDPA have nothing to do with vIOMMU driver in quemu.
> > 
> > You need to have an interface under /dev/ioasid to create both page
> > table levels and part of that will be to tell the kernel what VA is
> > mapped and how to handle faults.
> 
> VFIO/VDPA already have their own interface to manage GPA->HPA
> mappings. Why do we want to duplicate it in /dev/ioasid? 

They have their own interface to manage other types of HW, we should
not duplicate PASID programming into there too.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 04/12] iommu: Add a map_pages() op for IOMMU drivers

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:04PM -0700, Isaac J. Manjarres wrote:
> Add a callback for IOMMU drivers to provide a path for the
> IOMMU framework to call into an IOMMU driver, which can
> call into the io-pgtable code, to map a physically contiguous
> rnage of pages of the same size.
> 
> For IOMMU drivers that do not specify a map_pages() callback,
> the existing logic of mapping memory one page block at a time
> will be used.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Will Deacon 
> ---
>  include/linux/iommu.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9cf81242581a..528d6a58479e 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -192,6 +192,8 @@ struct iommu_iotlb_gather {
>   * @attach_dev: attach device to an iommu domain
>   * @detach_dev: detach device from an iommu domain
>   * @map: map a physically contiguous memory region to an iommu domain
> + * @map_pages: map a physically contiguous set of pages of the same size to
> + * an iommu domain.
>   * @unmap: unmap a physically contiguous memory region from an iommu domain
>   * @unmap_pages: unmap a number of pages of the same size from an iommu 
> domain
>   * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
> @@ -244,6 +246,9 @@ struct iommu_ops {
>   void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
>   int (*map)(struct iommu_domain *domain, unsigned long iova,
>  phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> + int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
> +  phys_addr_t paddr, size_t pgsize, size_t pgcount,
> +  int prot, gfp_t gfp, size_t *mapped);

(same comment as for the io-pgtable callback).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 03/12] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:03PM -0700, Isaac J. Manjarres wrote:
> Mapping memory into io-pgtables follows the same semantics
> that unmapping memory used to follow (i.e. a buffer will be
> mapped one page block per call to the io-pgtable code). This
> means that it can be optimized in the same way that unmapping
> memory was, so add a map_pages() callback to the io-pgtable
> ops structure, so that a range of pages of the same size
> can be mapped within the same call.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Will Deacon 
> ---
>  include/linux/io-pgtable.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 2ed0c057d9e7..019149b204b8 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -143,6 +143,7 @@ struct io_pgtable_cfg {
>   * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
>   *
>   * @map:  Map a physically contiguous memory region.
> + * @map_pages:Map a physically contiguous range of pages of the same 
> size.
>   * @unmap:Unmap a physically contiguous memory region.
>   * @unmap_pages:  Unmap a range of virtually contiguous pages of the same 
> size.
>   * @iova_to_phys: Translate iova to physical address.
> @@ -153,6 +154,9 @@ struct io_pgtable_cfg {
>  struct io_pgtable_ops {
>   int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
>  phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> + int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
> +  phys_addr_t paddr, size_t pgsize, size_t pgcount,
> +  int prot, gfp_t gfp, size_t *mapped);

How about returning 'size_t' and using IS_ERR_VALUE() instead of adding
the extra 'mapped' argument (i.e. return the size of the region mapped
or an error code)? I don't think we realistically need to care about map
sizes that overlap with the error region.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 06/12] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:06PM -0700, Isaac J. Manjarres wrote:
> From: Will Deacon 
> 
> The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
> intended to describe the alignment requirements to consider when
> choosing an appropriate page size. On the iommu_map() path, this address
> is the logical OR of the virtual and physical addresses.
> 
> Subsequent improvements to iommu_pgsize() will need to check the
> alignment of the virtual and physical components of 'addr_merge'
> independently, so pass them in as separate parameters and reconstruct
> 'addr_merge' locally.
> 
> No functional change.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Isaac J. Manjarres 
> ---
>  drivers/iommu/iommu.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9006397b6604..a3bbf7e310b0 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
> *domain, dma_addr_t iova)
>  }
>  EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
>  
> -static size_t iommu_pgsize(struct iommu_domain *domain,
> -unsigned long addr_merge, size_t size)
> +static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
> +phys_addr_t paddr, size_t size)
>  {
>   unsigned int pgsize_idx;
>   unsigned long pgsizes;
>   size_t pgsize;
> + phys_addr_t addr_merge = paddr | iova;

Huh, so this was 'unsigned long' before and, given that the pgsize_bitmap
on the domain is also unsigned long, then I think that's fine. So using
that would mean you don't need GENMASK_ULL for this guy either.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 05/12] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-04-06 Thread Will Deacon
On Mon, Apr 05, 2021 at 12:11:05PM -0700, Isaac J. Manjarres wrote:
> From: Will Deacon 
> 
> Avoid the potential for shifting values by amounts greater than the
> width of their type by using a bitmap to compute page size in
> iommu_pgsize().
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Isaac J. Manjarres 
> ---
>  drivers/iommu/iommu.c | 31 ---
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d0b0a15dba84..9006397b6604 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -8,6 +8,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain 
> *domain,
>  unsigned long addr_merge, size_t size)
>  {
>   unsigned int pgsize_idx;
> + unsigned long pgsizes;
>   size_t pgsize;
>  
> - /* Max page size that still fits into 'size' */
> - pgsize_idx = __fls(size);
> + /* Page sizes supported by the hardware and small enough for @size */
> + pgsizes = domain->pgsize_bitmap & GENMASK_ULL(__fls(size), 0);

See my comments on the other thread, but I don't think it's necessary to use
the _ULL versions everywhere.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/6] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-04-06 Thread Will Deacon
On Thu, Apr 01, 2021 at 06:39:35PM -0700, isa...@codeaurora.org wrote:
> On 2021-04-01 09:47, Will Deacon wrote:
> > Avoid the potential for shifting values by amounts greater than the
> > width of their type by using a bitmap to compute page size in
> > iommu_pgsize().
> > 
> > Signed-off-by: Will Deacon 
> > ---
> >  drivers/iommu/iommu.c | 31 ---
> >  1 file changed, 12 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index d0b0a15dba84..bcd623862bf9 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -8,6 +8,7 @@
> > 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain
> > *domain,
> >unsigned long addr_merge, size_t size)
> >  {
> > unsigned int pgsize_idx;
> > +   unsigned long pgsizes;
> > size_t pgsize;
> > 
> > -   /* Max page size that still fits into 'size' */
> > -   pgsize_idx = __fls(size);
> > +   /* Page sizes supported by the hardware and small enough for @size */
> > +   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
> I've fixed this in the latest RFC for the iommu_map/unmap optimization
> patches,
> but for the sake of completeness: I think this should be GENMASK_ULL, in
> case
> __fls(size) >= 32.

Hmm, but 'size' is a size_t; which architectures have sizeof(size_t) >
sizeof(unsigned long)?

> > -   /* need to consider alignment requirements ? */
> > -   if (likely(addr_merge)) {
> > -   /* Max page size allowed by address */
> > -   unsigned int align_pgsize_idx = __ffs(addr_merge);
> > -   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
> > -   }
> > -
> > -   /* build a mask of acceptable page sizes */
> > -   pgsize = (1UL << (pgsize_idx + 1)) - 1;
> > -
> > -   /* throw away page sizes not supported by the hardware */
> > -   pgsize &= domain->pgsize_bitmap;
> > +   /* Constrain the page sizes further based on the maximum alignment */
> > +   if (likely(addr_merge))
> > +   pgsizes &= GENMASK(__ffs(addr_merge), 0);

This one looks like more of an issue, though, as addr_merge is a
phys_addr_t, which certainly can be 64-bit where unsigned long is 32-bit
(e.g. Armv7 + LPAE)

Rather than make everything _ULL, please can you just do it where we're
actually using the larger types?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Add device name to iommu map/unmap trace events

2021-04-06 Thread Joerg Roedel
On Tue, Apr 06, 2021 at 02:56:53PM +0800, chenxiang (M) wrote:
> Is it possible to use group id to identify different domains?

No, the best is to do this during post-processing of your traces by also
keeping tack of domain-device attachments/detachments.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-06 Thread John Garry

On 25/03/2021 17:53, Will Deacon wrote:

On Thu, Mar 25, 2021 at 08:29:57PM +0800, John Garry wrote:

The Intel IOMMU driver supports flushing the per-CPU rcaches when a CPU is
offlined.

Let's move it to core code, so everyone can take advantage.

Also throw in a patch to stop exporting free_iova_fast().

Differences to v1:
- Add RB tags (thanks!)
- Add patch to stop exporting free_iova_fast()
- Drop patch to correct comment
- Add patch to delete iommu_dma_free_cpu_cached_iovas() and associated
   changes

John Garry (4):
   iova: Add CPU hotplug handler to flush rcaches
   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
   iommu: Delete iommu_dma_free_cpu_cached_iovas()
   iommu: Stop exporting free_iova_fast()

Looks like this is all set for 5.13, so hopefully Joerg can stick it in
-next for a bit more exposure.




Hi Joerg,

Can you kindly consider picking this up now?

Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/10] iommu: I/O page faults for SMMUv3

2021-04-06 Thread Jean-Philippe Brucker
On Thu, Apr 01, 2021 at 06:15:02PM +0100, Will Deacon wrote:
> On Thu, Apr 01, 2021 at 05:47:09PM +0200, Jean-Philippe Brucker wrote:
> > Add stall support to the SMMUv3 driver, along with a common I/O Page
> > Fault handler.
> > 
> > Since [v13] I added review and ack tags (Thanks!), and a lockdep_assert.
> > It would be good to have all of it in v5.13, since patch 10 introduces
> > the first user for the IOPF interface from patch 6.  But if that's not
> > possible, please pick patches 1-6 so the Vt-d driver can start using
> > them.
> 
> Patches 1-7 look good to me, but I'm not convinced about the utility of
> stalling faults so I'd prefer the later patches to come along with a
> real user.

As others said, it is possible to assign queues from the compression and
crypto accelerators on the Kunpeng920 to userspace, using the uacce char
device (upstream since last year, but waiting for implementations of the
SVA API in IOMMU drivers). I've been using that platform for testing my
code for the past year, with the UADK tool as well as an openssl plugin.

Securely assignig a queue to userspace requires full SVA support in
SMMUv3, which consists of PASID, page table sharing, and I/O page faults.
The first two were already merged, and the third one requires either Stall
or PRI. I'm not submitting PRI support at the moment because there is no
hardware, but the Hisilicon platform implements stall and will be able to
use it right away.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu