Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Raj, Ashok
Hi Alex

+ Bjorn

FWIW I can't understand why PCI SIG went different ways with ATS, 
where its enumerated on PF and VF. But for PASID and PRI its only
in PF. 

I'm checking with our internal SIG reps to followup on that.

On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > Is there vendor guarantee that hidden registers will locate at the
> > same offset between PF and VF config space? 
> 
> I'm not sure if the spec really precludes hidden registers, but the
> fact that these registers are explicitly outside of the capability
> chain implies they're only intended for device specific use, so I'd say
> there are no guarantees about anything related to these registers.

As you had suggested in the other thread, we could consider
using the same offset as in PF, but even that's a better guess
still not reliable.

The other option is to maybe extend driver ops in the PF to expose
where the offsets should be. Sort of adding the quirk in the 
implementation. 

I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is resisting 
making VF's first class citizen, we might ask them to add some verbiage
to suggest leave the same offsets as PF open to help emulation software.


> 
> FWIW, vfio started out being more strict about restricting config space
> access to defined capabilities, until...
> 
> commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> Author: Alex Williamson 
> Date:   Mon Apr 1 09:04:12 2013 -0600
> 

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE

2020-04-07 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 11:35 PM
> Subject: Re: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE
> 
> On Fri, 3 Apr 2020 06:39:22 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Friday, April 3, 2020 4:24 AM
> > >
> > > On Sun, 22 Mar 2020 05:32:04 -0700
> > > "Liu, Yi L"  wrote:
> > >
> > > > From: Liu Yi L 
> > > >
[...]
> 
> > >
> > > > +
> > > > +   if (copy_from_user(_inv, (void __user *)arg, 
> > > > minsz))
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (cache_inv.argsz < minsz || cache_inv.flags)
> > > > +   return -EINVAL;
> > > > +
> > > > +   /* Get the version of struct 
> > > > iommu_cache_invalidate_info */
> > > > +   if (copy_from_user(,
> > > > +   (void __user *) (arg + minsz), sizeof(version)))
> > > > +   return -EFAULT;
> > > > +
> > > > +   info_size = iommu_uapi_get_data_size(
> > > > +   IOMMU_UAPI_CACHE_INVAL,
> > > version);
> > > > +
> > > > +   cache_info = kzalloc(info_size, GFP_KERNEL);
> > > > +   if (!cache_info)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   if (copy_from_user(cache_info,
> > > > +   (void __user *) (arg + minsz), info_size)) {
> > > > +   kfree(cache_info);
> > > > +   return -EFAULT;
> > > > +   }
> > > > +
> > > > +   mutex_lock(>lock);
> > > > +   ret = vfio_iommu_for_each_dev(iommu, vfio_cache_inv_fn,
> > > > +   cache_info);
> > >
> > > How does a user respond when their cache invalidate fails?  Isn't this
> > > also another case where our for_each_dev can fail at an arbitrary point
> > > leaving us with no idea whether each device even had the opportunity to
> > > perform the invalidation request.  I don't see how we have any chance
> > > to maintain coherency after this faults.
> >
> > Then can we make it simple to support singleton group only?
> 
> Are you suggesting a single group per container or a single device per
> group? Unless we have both, aren't we always going to have this issue.

Agreed. we need both to avoid the potential for_each_dev() loop issue.
I suppose this is also the most typical and desired config for vSVA
support. I think it makes sense with below items:

a) one group per container
PASID and nested translation gives user-space a chance to attach their
page table (e.g. guest process page table) to host IOMMU, this is vSVA.
If adding multiple groups to a vSVA-capable container, then a SVA bind
on this container means bind it with all groups (devices are included)
within the container. This doesn't make sense with three reasons: for
one the passthru devices are not necessary to be manipulated by same
guest application; for two passthru devices are not surely added in a
single guest group; for three not all passthru devices (either from
different group or same group) are sva capable.
As above, enforce one group per container makes sense to me.

b) one device per group
SVA support is limited to singleton group so far in bare-metal bind
per Jean's series. I think it's be good to follow it in passthru case.
https://patchwork.kernel.org/patch/10213877/
https://lkml.org/lkml/2019/4/10/663
As mentioned in a), group may have both SVA-capable device and non-SVA
-capable device, it would be a problem for VFIO to figure a way to isolate
them.

> OTOH, why should a cache invalidate fail?

there are sanity check done by vendor iommu driver against the invalidate
request from userspace. so it may fail if sanity check failed. But I guess
it may be better to something like abort instead of fail the request. isn't?

> 
> > > > +   mutex_unlock(>lock);
> > > > +   kfree(cache_info);
> > > > +   return ret;
> > > > }
> > > >
> > > > return -ENOTTY;
> > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > index 2235bc6..62ca791 100644
> > > > --- a/include/uapi/linux/vfio.h
> > > > +++ b/include/uapi/linux/vfio.h
> > > > @@ -899,6 +899,28 @@ struct vfio_iommu_type1_bind {
> > > >   */
> > > >  #define VFIO_IOMMU_BIND_IO(VFIO_TYPE, VFIO_BASE + 23)
> > > >
> > > > +/**
> > > > + * VFIO_IOMMU_CACHE_INVALIDATE - _IOW(VFIO_TYPE, VFIO_BASE + 24,
> > > > + * struct vfio_iommu_type1_cache_invalidate)
> > > > + *
> > > > + * Propagate guest IOMMU cache invalidation to the host. The cache
> > > > + * invalidation information is conveyed by @cache_info, the content
> > > > + * format would be structures defined in uapi/linux/iommu.h. User
> > > > + * should be aware of that the struct  iommu_cache_invalidate_info
> > > > + * has a @version field, vfio needs to parse this field before getting
> > > > + 

RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-07 Thread Liu, Yi L
> From: Liu, Yi L
> Sent: Tuesday, April 7, 2020 5:43 PM
>
> > We don't, the PASID spaces are per-VM on Arm, so this function should
> > consult the IOMMU driver before setting flags. As you said on patch 3,
> > nested doesn't necessarily imply PASID support. The SMMUv2 does not
> > support PASID but does support nesting stages 1 and 2 for the IOVA space.
> > SMMUv3 support of PASID depends on HW capabilities. So I think this
> > needs to be finer grained:
> >
> > Does the container support:
> > * VFIO_IOMMU_PASID_REQUEST?
> >   -> Yes for VT-d 3
> >   -> No for Arm SMMU
> > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
> >   -> Yes for VT-d 3
> >   -> Sometimes for SMMUv2
> >   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
> >  PASID tables being in GPA space.)
> > * VFIO_IOMMU_BIND_PASID_TABLE?
> >   -> No for VT-d
> >   -> Sometimes for SMMUv3
> >
> > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.
> 
> good summary. do you expect to see any
please ignore this message. I planned to ask if possible to report
VFIO_IOMMU_CACHE_INVALIDATE  only (no bind support). But I stopped
typing it when I came to believe it's unnecessary to report it if
there is no bind support.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)

2020-04-07 Thread Liu, Yi L
Hi Alex,
> From: Alex Williamson 
> Sent: Saturday, April 4, 2020 1:50 AM
> Subject: Re: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
> 
> On Fri, 3 Apr 2020 13:12:50 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson 
> > > Sent: Friday, April 3, 2020 1:50 AM
> > > Subject: Re: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > >
> > > On Sun, 22 Mar 2020 05:31:58 -0700
> > > "Liu, Yi L"  wrote:
> > >
> > > > From: Liu Yi L 
> > > >
[...]
> > > >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > >unsigned int cmd, unsigned long arg)
> > > >  {
> > > > @@ -2276,6 +2333,53 @@ static long vfio_iommu_type1_ioctl(void
> > > *iommu_data,
> > > >
> > > > return copy_to_user((void __user *)arg, , minsz) ?
> > > > -EFAULT : 0;
> > > > +
> > > > +   } else if (cmd == VFIO_IOMMU_PASID_REQUEST) {
> > > > +   struct vfio_iommu_type1_pasid_request req;
> > > > +   unsigned long offset;
> > > > +
> > > > +   minsz = offsetofend(struct 
> > > > vfio_iommu_type1_pasid_request,
> > > > +   flags);
> > > > +
> > > > +   if (copy_from_user(, (void __user *)arg, minsz))
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (req.argsz < minsz ||
> > > > +   !vfio_iommu_type1_pasid_req_valid(req.flags))
> > > > +   return -EINVAL;
> > > > +
> > > > +   if (copy_from_user((void *) + minsz,
> > > > +  (void __user *)arg + minsz,
> > > > +  sizeof(req) - minsz))
> > > > +   return -EFAULT;
> > >
> > > Huh?  Why do we have argsz if we're going to assume this is here?
> >
> > do you mean replacing sizeof(req) with argsz? if yes, I can do that.
> 
> No, I mean the user tells us how much data they've provided via argsz.
> We create minsz the the end of flags and verify argsz includes flags.
> Then we proceed to ignore argsz to see if the user has provided the
> remainder of the structure.

I think I should avoid using sizeof(req) as it may be variable
new flag is added. I think better to make a data[] field in struct
vfio_iommu_type1_pasid_request and copy data[] per flag. I'll
make this change in new version.

> > > > +
> > > > +   switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> > > > +   case VFIO_IOMMU_PASID_ALLOC:
> > > > +   {
> > > > +   int ret = 0, result;
> > > > +
> > > > +   result = vfio_iommu_type1_pasid_alloc(iommu,
> > > > +   
> > > > req.alloc_pasid.min,
> > > > +   
> > > > req.alloc_pasid.max);
> > > > +   if (result > 0) {
> > > > +   offset = offsetof(
> > > > +   struct 
> > > > vfio_iommu_type1_pasid_request,
> > > > +   alloc_pasid.result);
> > > > +   ret = copy_to_user(
> > > > + (void __user *) (arg + 
> > > > offset),
> > > > + , sizeof(result));
> > >
> > > Again assuming argsz supports this.
> >
> > same as above.
> >
> > >
> > > > +   } else {
> > > > +   pr_debug("%s: PASID alloc failed\n", 
> > > > __func__);
> > >
> > > rate limit?
> >
> > not quite get. could you give more hints?
> 
> A user can spam the host logs simply by allocating their quota of
> PASIDs and then trying to allocate more, or by specifying min/max such
> that they cannot allocate the requested PASID.  If this logging is
> necessary for debugging, it should be ratelimited to avoid a DoS on the
> host.

got it. thanks for the coaching. will use pr_debug_ratelimited().

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Tuesday, April 7, 2020 11:58 PM
> 
> On Tue, 7 Apr 2020 04:26:23 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Saturday, April 4, 2020 1:26 AM
> > [...]
> > > > > > +   if (!pasid_cap.control_reg.paside) {
> > > > > > +   pr_debug("%s: its PF's PASID capability is not
> enabled\n",
> > > > > > +   dev_name(>pdev->dev));
> > > > > > +   ret = 0;
> > > > > > +   goto out;
> > > > > > +   }
> > > > >
> > > > > What happens if the PF's PASID gets disabled while we're using it??
> > > >
> > > > This is actually the open I highlighted in cover letter. Per the reply
> > > > from Baolu, this seems to be an open for bare-metal all the same.
> > > > https://lkml.org/lkml/2020/3/31/95
> > >
> > > Seems that needs to get sorted out before we can expose this.  Maybe
> > > some sort of registration with the PF driver that PASID is being used
> > > by a VF so it cannot be disabled?
> >
> > I guess we may do vSVA for PF first, and then adding VF vSVA later
> > given above additional need. It's not necessarily to enable both
> > in one step.
> >
> > [...]
> > > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct
> > > vfio_pci_device *vdev)
> > > > > > if (!ecaps)
> > > > > > *(u32 *)>vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > > >
> > > > > > +#ifdef CONFIG_PCI_ATS
> > > > > > +   if (pdev->is_virtfn) {
> > > > > > +   struct pci_dev *physfn = pdev->physfn;
> > > > > > +
> > > > > > +   ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > > +   physfn, epos_max, prev);
> > > > > > +   if (ret)
> > > > > > +   pr_info("%s, failed to add special caps for
> VF %s\n",
> > > > > > +   __func__, dev_name(>pdev-
> >dev));
> > > > > > +   }
> > > > > > +#endif
> > > > >
> > > > > I can only imagine that we should place the caps at the same location
> > > > > they exist on the PF, we don't know what hidden registers might be
> > > > > hiding in config space.
> >
> > Is there vendor guarantee that hidden registers will locate at the
> > same offset between PF and VF config space?
> 
> I'm not sure if the spec really precludes hidden registers, but the
> fact that these registers are explicitly outside of the capability
> chain implies they're only intended for device specific use, so I'd say
> there are no guarantees about anything related to these registers.
> 
> FWIW, vfio started out being more strict about restricting config space
> access to defined capabilities, until...
> 
> commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> Author: Alex Williamson 
> Date:   Mon Apr 1 09:04:12 2013 -0600
> 
> vfio-pci: Enable raw access to unassigned config space
> 
> Devices like be2net hide registers between the gaps in capabilities
> and architected regions of PCI config space.  Our choices to support
> such devices is to either build an ever growing and unmanageable white
> list or rely on hardware isolation to protect us.  These registers are
> really no different than MMIO or I/O port space registers, which we
> don't attempt to regulate, so treat PCI config space in the same way.
> 
> > > > but we are not sure whether the same location is available on VF. In
> > > > this patch, it actually places the emulated cap physically behind the
> > > > cap which lays farthest (its offset is largest) within VF's config space
> > > > as the PCIe caps are linked in a chain.
> > >
> > > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > > nasty habit of hiding random registers in PCI config space, outside of
> > > defined capabilities.  I feel like IGD might even do this too, is that
> > > true?  So I don't think we can guarantee that just because a section of
> > > config space isn't part of a defined capability that its unused.  It
> > > only means that it's unused by common code, but it might have device
> > > specific purposes.  So of the PCIe spec indicates that VFs cannot
> > > include these capabilities and virtialization software needs to
> > > emulate them, we need somewhere safe to place them in config space,
> and
> > > simply placing them off the end of known capabilities doesn't give me
> > > any confidence.  Also, hardware has no requirement to make compact
> use
> > > of extended config space.  The first capability must be at 0x100, the
> > > very next capability could consume all the way to the last byte of the
> > > 4K extended range, and the next link in the chain could be somewhere in
> > > the middle.  Thanks,
> > >
> >
> > Then what would be a viable option? Vendor nasty habit implies
> > no standard, thus I don't see how VFIO can find a safe location
> > by itself. Also curious how those hidden registers are identified
> > by VFIO and employed with proper r/w policy today. If sort of quirks
> > are used, then could such quirk way be extended to also carry

[RFC PATCH 17/34] iommu/arm-smmu: Store device instead of group in arm_smmu_s2cr

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

This is required to convert the arm-smmu driver to the
probe/release_device() interface.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a6a5796e9c41..3493501d8b2c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -69,7 +69,7 @@ MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
 struct arm_smmu_s2cr {
-   struct iommu_group  *group;
+   struct device   *dev;
int count;
enum arm_smmu_s2cr_type type;
enum arm_smmu_s2cr_privcfg  privcfg;
@@ -1100,7 +1100,7 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
/* It worked! Now, poke the actual hardware */
for_each_cfg_sme(cfg, fwspec, i, idx) {
arm_smmu_write_sme(smmu, idx);
-   smmu->s2crs[idx].group = group;
+   smmu->s2crs[idx].dev = dev;
}
 
mutex_unlock(>stream_map_mutex);
@@ -1495,11 +1495,15 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
int i, idx;
 
for_each_cfg_sme(cfg, fwspec, i, idx) {
-   if (group && smmu->s2crs[idx].group &&
-   group != smmu->s2crs[idx].group)
+   struct iommu_group *idx_grp = NULL;
+
+   if (smmu->s2crs[idx].dev)
+   idx_grp = smmu->s2crs[idx].dev->iommu_group;
+
+   if (group && idx_grp && group != idx_grp)
return ERR_PTR(-EINVAL);
 
-   group = smmu->s2crs[idx].group;
+   group = idx_grp;
}
 
if (group)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 26/34] iommu/rockchip: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Rockchip IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/rockchip-iommu.c | 26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index b33cdd5aad81..d25c2486ca07 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1054,40 +1054,28 @@ static void rk_iommu_domain_free(struct iommu_domain 
*domain)
kfree(rk_domain);
 }
 
-static int rk_iommu_add_device(struct device *dev)
+static struct iommu_device *rk_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-   struct rk_iommu *iommu;
struct rk_iommudata *data;
+   struct rk_iommu *iommu;
 
data = dev->archdata.iommu;
if (!data)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
iommu = rk_iommu_from_dev(dev);
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-   iommu_group_put(group);
-
-   iommu_device_link(>iommu, dev);
data->link = device_link_add(dev, iommu->dev,
 DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
 
-   return 0;
+   return >iommu;
 }
 
-static void rk_iommu_remove_device(struct device *dev)
+static void rk_iommu_release_device(struct device *dev)
 {
-   struct rk_iommu *iommu;
struct rk_iommudata *data = dev->archdata.iommu;
 
-   iommu = rk_iommu_from_dev(dev);
-
device_link_del(data->link);
-   iommu_device_unlink(>iommu, dev);
-   iommu_group_remove_device(dev);
 }
 
 static struct iommu_group *rk_iommu_device_group(struct device *dev)
@@ -1126,8 +1114,8 @@ static const struct iommu_ops rk_iommu_ops = {
.detach_dev = rk_iommu_detach_device,
.map = rk_iommu_map,
.unmap = rk_iommu_unmap,
-   .add_device = rk_iommu_add_device,
-   .remove_device = rk_iommu_remove_device,
+   .probe_device = rk_iommu_probe_device,
+   .release_device = rk_iommu_release_device,
.iova_to_phys = rk_iommu_iova_to_phys,
.device_group = rk_iommu_device_group,
.pgsize_bitmap = RK_IOMMU_PGSIZE_BITMAP,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 18/34] iommu/arm-smmu: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the arm-smmu and arm-smmu-v3 drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code does the
group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/arm-smmu-v3.c | 42 +
 drivers/iommu/arm-smmu.c| 30 --
 2 files changed, 19 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 82508730feb7..7d3c38e088d7 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2914,27 +2914,26 @@ static bool arm_smmu_sid_in_range(struct 
arm_smmu_device *smmu, u32 sid)
 
 static struct iommu_ops arm_smmu_ops;
 
-static int arm_smmu_add_device(struct device *dev)
+static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 {
int i, ret;
struct arm_smmu_device *smmu;
struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct iommu_group *group;
 
if (!fwspec || fwspec->ops != _smmu_ops)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
-   return -EBUSY;
+   return ERR_PTR(-EBUSY);
 
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
if (!smmu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
master = kzalloc(sizeof(*master), GFP_KERNEL);
if (!master)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
master->dev = dev;
master->smmu = smmu;
@@ -2971,34 +2970,15 @@ static int arm_smmu_add_device(struct device *dev)
 */
arm_smmu_enable_pasid(master);
 
-   if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
-   master->ssid_bits = min_t(u8, master->ssid_bits,
- CTXDESC_LINEAR_CDMAX);
-
-   ret = iommu_device_link(>iommu, dev);
-   if (ret)
-   goto err_disable_pasid;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto err_unlink;
-   }
-
-   iommu_group_put(group);
-   return 0;
+   return >iommu;
 
-err_unlink:
-   iommu_device_unlink(>iommu, dev);
-err_disable_pasid:
-   arm_smmu_disable_pasid(master);
 err_free_master:
kfree(master);
dev_iommu_priv_set(dev, NULL);
-   return ret;
+   return ERR_PTR(ret);
 }
 
-static void arm_smmu_remove_device(struct device *dev)
+static void arm_smmu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_master *master;
@@ -3010,8 +2990,6 @@ static void arm_smmu_remove_device(struct device *dev)
master = dev_iommu_priv_get(dev);
smmu = master->smmu;
arm_smmu_detach_dev(master);
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(>iommu, dev);
arm_smmu_disable_pasid(master);
kfree(master);
iommu_fwspec_free(dev);
@@ -3138,8 +3116,8 @@ static struct iommu_ops arm_smmu_ops = {
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-   .add_device = arm_smmu_add_device,
-   .remove_device  = arm_smmu_remove_device,
+   .probe_device   = arm_smmu_probe_device,
+   .release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 3493501d8b2c..7b13b2371ad2 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -220,7 +220,7 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
  * With the legacy DT binding in play, we have no guarantees about
  * probe order, but then we're also not doing default domains, so we can
  * delay setting bus ops until we're sure every possible SMMU is ready,
- * and that way ensure that no add_device() calls get missed.
+ * and that way ensure that no probe_device() calls get missed.
  */
 static int arm_smmu_legacy_bus_init(void)
 {
@@ -1062,7 +1062,6 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
struct arm_smmu_device *smmu = cfg->smmu;
struct arm_smmu_smr *smrs = smmu->smrs;
-   struct iommu_group *group;
int i, idx, ret;
 
mutex_lock(>stream_map_mutex);
@@ -1090,13 +1089,6 @@ static int arm_smmu_master_alloc_smes(struct device *dev)
cfg->smendx[i] = (s16)idx;
}
 
-   group = iommu_group_get_for_dev(dev);
-   if 

[RFC PATCH 30/34] iommu/omap: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the OMAP IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 49 ++
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index ecc9d0829a91..6699fe6d9e06 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1640,15 +1640,13 @@ static phys_addr_t omap_iommu_iova_to_phys(struct 
iommu_domain *domain,
return ret;
 }
 
-static int omap_iommu_add_device(struct device *dev)
+static struct iommu_device *omap_iommu_probe_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data, *tmp;
+   struct platform_device *pdev;
struct omap_iommu *oiommu;
-   struct iommu_group *group;
struct device_node *np;
-   struct platform_device *pdev;
int num_iommus, i;
-   int ret;
 
/*
 * Allocate the archdata iommu structure for DT-based devices.
@@ -1657,7 +1655,7 @@ static int omap_iommu_add_device(struct device *dev)
 * IOMMU users.
 */
if (!dev->of_node)
-   return 0;
+   return ERR_PTR(-ENODEV);
 
/*
 * retrieve the count of IOMMU nodes using phandle size as element size
@@ -1670,27 +1668,27 @@ static int omap_iommu_add_device(struct device *dev)
 
arch_data = kcalloc(num_iommus + 1, sizeof(*arch_data), GFP_KERNEL);
if (!arch_data)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
for (i = 0, tmp = arch_data; i < num_iommus; i++, tmp++) {
np = of_parse_phandle(dev->of_node, "iommus", i);
if (!np) {
kfree(arch_data);
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
}
 
pdev = of_find_device_by_node(np);
if (!pdev) {
of_node_put(np);
kfree(arch_data);
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
}
 
oiommu = platform_get_drvdata(pdev);
if (!oiommu) {
of_node_put(np);
kfree(arch_data);
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
}
 
tmp->iommu_dev = oiommu;
@@ -1699,46 +1697,25 @@ static int omap_iommu_add_device(struct device *dev)
of_node_put(np);
}
 
+   dev->archdata.iommu = arch_data;
+
/*
 * use the first IOMMU alone for the sysfs device linking.
 * TODO: Evaluate if a single iommu_group needs to be
 * maintained for both IOMMUs
 */
oiommu = arch_data->iommu_dev;
-   ret = iommu_device_link(>iommu, dev);
-   if (ret) {
-   kfree(arch_data);
-   return ret;
-   }
-
-   dev->archdata.iommu = arch_data;
-
-   /*
-* IOMMU group initialization calls into omap_iommu_device_group, which
-* needs a valid dev->archdata.iommu pointer
-*/
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   iommu_device_unlink(>iommu, dev);
-   dev->archdata.iommu = NULL;
-   kfree(arch_data);
-   return PTR_ERR(group);
-   }
-   iommu_group_put(group);
 
-   return 0;
+   return >iommu;
 }
 
-static void omap_iommu_remove_device(struct device *dev)
+static void omap_iommu_release_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
 
if (!dev->of_node || !arch_data)
return;
 
-   iommu_device_unlink(_data->iommu_dev->iommu, dev);
-   iommu_group_remove_device(dev);
-
dev->archdata.iommu = NULL;
kfree(arch_data);
 
@@ -1763,8 +1740,8 @@ static const struct iommu_ops omap_iommu_ops = {
.map= omap_iommu_map,
.unmap  = omap_iommu_unmap,
.iova_to_phys   = omap_iommu_iova_to_phys,
-   .add_device = omap_iommu_add_device,
-   .remove_device  = omap_iommu_remove_device,
+   .probe_device   = omap_iommu_probe_device,
+   .release_device = omap_iommu_release_device,
.device_group   = omap_iommu_device_group,
.pgsize_bitmap  = OMAP_IOMMU_PGSIZES,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 10/34] iommu: Move new probe_device path to separate function

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

This makes it easier to remove to old code-path when all drivers are
converted. As a side effect that it also fixes the error cleanup
path.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 69 ---
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 18eb3623bd00..8be047a4808f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -218,12 +218,55 @@ static int __iommu_probe_device(struct device *dev, 
struct list_head *group_list
return ret;
 }
 
+static int __iommu_probe_device_helper(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_group *group;
+   int ret;
+
+   ret = __iommu_probe_device(dev, NULL);
+   if (ret)
+   goto err_out;
+
+   /*
+* Try to allocate a default domain - needs support from the
+* IOMMU driver. There are still some drivers which don't
+* support default domains, so the return value is not yet
+* checked.
+*/
+   iommu_alloc_default_domain(dev);
+
+   group = iommu_group_get(dev);
+   if (!group)
+   goto err_release;
+
+   if (group->default_domain)
+   ret = __iommu_attach_device(group->default_domain, dev);
+
+   iommu_group_put(group);
+
+   if (ret)
+   goto err_release;
+
+   if (ops->probe_finalize)
+   ops->probe_finalize(dev);
+
+   return 0;
+
+err_release:
+   iommu_release_device(dev);
+err_out:
+   return ret;
+
+}
+
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
int ret;
 
WARN_ON(dev->iommu_group);
+
if (!ops)
return -EINVAL;
 
@@ -235,30 +278,10 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   if (ops->probe_device) {
-   struct iommu_group *group;
-
-   ret = __iommu_probe_device(dev, NULL);
-
-   /*
-* Try to allocate a default domain - needs support from the
-* IOMMU driver. There are still some drivers which don't
-* support default domains, so the return value is not yet
-* checked.
-*/
-   if (!ret)
-   iommu_alloc_default_domain(dev);
-
-   group = iommu_group_get(dev);
-   if (group && group->default_domain) {
-   ret = __iommu_attach_device(group->default_domain, dev);
-   iommu_group_put(group);
-   }
-
-   } else {
-   ret = ops->add_device(dev);
-   }
+   if (ops->probe_device)
+   return __iommu_probe_device_helper(dev);
 
+   ret = ops->add_device(dev);
if (ret)
goto err_module_put;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 13/34] iommu: Export bus_iommu_probe() and make is safe for re-probing

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Add a check to the bus_iommu_probe() call-path to make sure it ignores
devices which have already been successfully probed. Then export the
bus_iommu_probe() function so it can be used by IOMMU drivers.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 6 +-
 include/linux/iommu.h | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 844613850595..cf25c1e48830 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1615,6 +1615,10 @@ static int probe_iommu_group(struct device *dev, void 
*data)
if (!dev_iommu_get(dev))
return -ENOMEM;
 
+   /* Device is probed already if in a group */
+   if (iommu_group_get(dev) != NULL)
+   return 0;
+
if (!try_module_get(ops->owner)) {
ret = -EINVAL;
goto err_free_dev_iommu;
@@ -1783,7 +1787,7 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group)
  iommu_do_create_direct_mappings);
 }
 
-static int bus_iommu_probe(struct bus_type *bus)
+int bus_iommu_probe(struct bus_type *bus)
 {
const struct iommu_ops *ops = bus->iommu_ops;
int ret;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 30170d191e5e..fea1622408ad 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -445,6 +445,7 @@ static inline void iommu_iotlb_gather_init(struct 
iommu_iotlb_gather *gather)
 #define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER  6 /* Post Driver unbind */
 
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
+extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
 extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 19/34] iommu/pamu: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the PAMU IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/fsl_pamu_domain.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 06828e2698d5..928d37771ece 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -1016,25 +1016,13 @@ static struct iommu_group *fsl_pamu_device_group(struct 
device *dev)
return group;
 }
 
-static int fsl_pamu_add_device(struct device *dev)
+static struct iommu_device *fsl_pamu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-
-   iommu_device_link(_iommu, dev);
-
-   return 0;
+   return _iommu;
 }
 
-static void fsl_pamu_remove_device(struct device *dev)
+static void fsl_pamu_release_device(struct device *dev)
 {
-   iommu_device_unlink(_iommu, dev);
-   iommu_group_remove_device(dev);
 }
 
 static const struct iommu_ops fsl_pamu_ops = {
@@ -1048,8 +1036,8 @@ static const struct iommu_ops fsl_pamu_ops = {
.iova_to_phys   = fsl_pamu_iova_to_phys,
.domain_set_attr = fsl_pamu_set_domain_attr,
.domain_get_attr = fsl_pamu_get_domain_attr,
-   .add_device = fsl_pamu_add_device,
-   .remove_device  = fsl_pamu_remove_device,
+   .probe_device   = fsl_pamu_probe_device,
+   .release_device = fsl_pamu_release_device,
.device_group   = fsl_pamu_device_group,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 16/34] iommu/vt-d: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Intel IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 67 -
 1 file changed, 6 insertions(+), 61 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b9f905a55dda..b906727f5b85 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5781,78 +5781,27 @@ static bool intel_iommu_capable(enum iommu_cap cap)
return false;
 }
 
-static int intel_iommu_add_device(struct device *dev)
+static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 {
-   struct dmar_domain *dmar_domain;
-   struct iommu_domain *domain;
struct intel_iommu *iommu;
-   struct iommu_group *group;
u8 bus, devfn;
-   int ret;
 
iommu = device_to_iommu(dev, , );
if (!iommu)
-   return -ENODEV;
-
-   iommu_device_link(>iommu, dev);
+   return ERR_PTR(-ENODEV);
 
if (translation_pre_enabled(iommu))
dev->archdata.iommu = DEFER_DEVICE_DOMAIN_INFO;
 
-   group = iommu_group_get_for_dev(dev);
-
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto unlink;
-   }
-
-   iommu_group_put(group);
-
-   domain = iommu_get_domain_for_dev(dev);
-   dmar_domain = to_dmar_domain(domain);
-   if (domain->type == IOMMU_DOMAIN_DMA) {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) {
-   ret = iommu_request_dm_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   domain_add_dev_info(si_domain, dev);
-   dev_info(dev,
-"Device uses a private identity 
domain.\n");
-   }
-   }
-   } else {
-   if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
-   ret = iommu_request_dma_domain_for_dev(dev);
-   if (ret) {
-   dmar_remove_one_dev_info(dev);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   if (!get_private_domain_for_dev(dev)) {
-   dev_warn(dev,
-"Failed to get a private 
domain.\n");
-   ret = -ENOMEM;
-   goto unlink;
-   }
-
-   dev_info(dev,
-"Device uses a private dma domain.\n");
-   }
-   }
-   }
-
if (device_needs_bounce(dev)) {
dev_info(dev, "Use Intel IOMMU bounce page dma_ops\n");
set_dma_ops(dev, _dma_ops);
}
 
-   return 0;
-
-unlink:
-   iommu_device_unlink(>iommu, dev);
-   return ret;
+   return >iommu;
 }
 
-static void intel_iommu_remove_device(struct device *dev)
+static void intel_iommu_release_device(struct device *dev)
 {
struct intel_iommu *iommu;
u8 bus, devfn;
@@ -5863,10 +5812,6 @@ static void intel_iommu_remove_device(struct device *dev)
 
dmar_remove_one_dev_info(dev);
 
-   iommu_group_remove_device(dev);
-
-   iommu_device_unlink(>iommu, dev);
-
if (device_needs_bounce(dev))
set_dma_ops(dev, NULL);
 }
@@ -6198,8 +6143,8 @@ const struct iommu_ops intel_iommu_ops = {
.map= intel_iommu_map,
.unmap  = intel_iommu_unmap,
.iova_to_phys   = intel_iommu_iova_to_phys,
-   .add_device = intel_iommu_add_device,
-   .remove_device  = intel_iommu_remove_device,
+   .probe_device   = intel_iommu_probe_device,
+   .release_device = intel_iommu_release_device,
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
.apply_resv_region  = intel_iommu_apply_resv_region,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 34/34] iommu: Unexport iommu_group_get_for_dev()

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

The function is now only used in IOMMU core code and shouldn't be used
outside of it anyway, so remove the export for it.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 4 ++--
 include/linux/iommu.h | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d9032f9d597c..8a5e1ac328dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -91,6 +91,7 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 struct iommu_group *group);
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
+static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -1483,7 +1484,7 @@ static int iommu_alloc_default_domain(struct device *dev)
  * to the returned IOMMU group, which will already include the provided
  * device.  The reference should be released with iommu_group_put().
  */
-struct iommu_group *iommu_group_get_for_dev(struct device *dev)
+static struct iommu_group *iommu_group_get_for_dev(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
@@ -1514,7 +1515,6 @@ struct iommu_group *iommu_group_get_for_dev(struct device 
*dev)
 
return ERR_PTR(ret);
 }
-EXPORT_SYMBOL(iommu_group_get_for_dev);
 
 struct iommu_domain *iommu_group_default_domain(struct iommu_group *group)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dd076366383f..7cfd2dddb49d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -527,7 +527,6 @@ extern int iommu_page_response(struct device *dev,
   struct iommu_page_response *msg);
 
 extern int iommu_group_id(struct iommu_group *group);
-extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
 
 extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 14/34] iommu/amd: Remove dev_data->passthrough

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Make use of generic IOMMU infrastructure to gather the same information
carried in dev_data->passthrough and remove the struct member.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c   | 10 +-
 drivers/iommu/amd_iommu_types.h |  1 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 3e0d27f7622e..0b4b4faa876d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2047,8 +2047,8 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
 static int attach_device(struct device *dev,
 struct protection_domain *domain)
 {
-   struct pci_dev *pdev;
struct iommu_dev_data *dev_data;
+   struct pci_dev *pdev;
unsigned long flags;
int ret;
 
@@ -2067,8 +2067,10 @@ static int attach_device(struct device *dev,
 
pdev = to_pci_dev(dev);
if (domain->flags & PD_IOMMUV2_MASK) {
+   struct iommu_domain *def_domain = iommu_get_dma_domain(dev);
+
ret = -EINVAL;
-   if (!dev_data->passthrough)
+   if (def_domain->type != IOMMU_DOMAIN_IDENTITY)
goto out;
 
if (dev_data->iommu_v2) {
@@ -2189,9 +2191,7 @@ static int amd_iommu_add_device(struct device *dev)
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_IDENTITY)
-   dev_data->passthrough = true;
-   else if (domain->type == IOMMU_DOMAIN_DMA)
+   if (domain->type == IOMMU_DOMAIN_DMA)
iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
 
 out:
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index ca8c4522045b..d0d7b6a0c3d8 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -640,7 +640,6 @@ struct iommu_dev_data {
struct pci_dev *pdev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
-   bool passthrough; /* Device is identity mapped */
struct {
bool enabled;
int qdep;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 05/34] iommu/amd: Remove dma_mask check from check_device()

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

The check was only needed for the DMA-API implementation in the AMD
IOMMU driver, which no longer exists.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 73b4f84cf449..504f2db75eda 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -326,7 +326,7 @@ static bool check_device(struct device *dev)
 {
int devid;
 
-   if (!dev || !dev->dma_mask)
+   if (!dev)
return false;
 
devid = get_device_id(dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 21/34] iommu/virtio: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the VirtIO IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/virtio-iommu.c | 41 +---
 1 file changed, 10 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d5cac4f46ca5..bda300c2a438 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -865,24 +865,23 @@ static struct viommu_dev *viommu_get_by_fwnode(struct 
fwnode_handle *fwnode)
return dev ? dev_to_virtio(dev)->priv : NULL;
 }
 
-static int viommu_add_device(struct device *dev)
+static struct iommu_device *viommu_probe_device(struct device *dev)
 {
int ret;
-   struct iommu_group *group;
struct viommu_endpoint *vdev;
struct viommu_dev *viommu = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 
if (!fwspec || fwspec->ops != _ops)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
viommu = viommu_get_by_fwnode(fwspec->iommu_fwnode);
if (!viommu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
vdev->dev = dev;
vdev->viommu = viommu;
@@ -896,45 +895,25 @@ static int viommu_add_device(struct device *dev)
goto err_free_dev;
}
 
-   ret = iommu_device_link(>iommu, dev);
-   if (ret)
-   goto err_free_dev;
+   return >iommu;
 
-   /*
-* Last step creates a default domain and attaches to it. Everything
-* must be ready.
-*/
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group)) {
-   ret = PTR_ERR(group);
-   goto err_unlink_dev;
-   }
-
-   iommu_group_put(group);
-
-   return PTR_ERR_OR_ZERO(group);
-
-err_unlink_dev:
-   iommu_device_unlink(>iommu, dev);
 err_free_dev:
generic_iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 
-   return ret;
+   return ERR_PTR(ret);
 }
 
-static void viommu_remove_device(struct device *dev)
+static void viommu_release_device(struct device *dev)
 {
-   struct viommu_endpoint *vdev;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct viommu_endpoint *vdev;
 
if (!fwspec || fwspec->ops != _ops)
return;
 
vdev = dev_iommu_priv_get(dev);
 
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(>viommu->iommu, dev);
generic_iommu_put_resv_regions(dev, >resv_regions);
kfree(vdev);
 }
@@ -960,8 +939,8 @@ static struct iommu_ops viommu_ops = {
.unmap  = viommu_unmap,
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
-   .add_device = viommu_add_device,
-   .remove_device  = viommu_remove_device,
+   .probe_device   = viommu_probe_device,
+   .release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 22/34] iommu/msm: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the MSM IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/msm_iommu.c | 34 +++---
 1 file changed, 7 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 94a6df1bddd6..10cd4db0710a 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -388,43 +388,23 @@ static struct msm_iommu_dev *find_iommu_for_dev(struct 
device *dev)
return ret;
 }
 
-static int msm_iommu_add_device(struct device *dev)
+static struct iommu_device *msm_iommu_probe_device(struct device *dev)
 {
struct msm_iommu_dev *iommu;
-   struct iommu_group *group;
unsigned long flags;
 
spin_lock_irqsave(_iommu_lock, flags);
iommu = find_iommu_for_dev(dev);
spin_unlock_irqrestore(_iommu_lock, flags);
 
-   if (iommu)
-   iommu_device_link(>iommu, dev);
-   else
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
-   return 0;
+   return >iommu;
 }
 
-static void msm_iommu_remove_device(struct device *dev)
+static void msm_iommu_release_device(struct device *dev)
 {
-   struct msm_iommu_dev *iommu;
-   unsigned long flags;
-
-   spin_lock_irqsave(_iommu_lock, flags);
-   iommu = find_iommu_for_dev(dev);
-   spin_unlock_irqrestore(_iommu_lock, flags);
-
-   if (iommu)
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
 }
 
 static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device 
*dev)
@@ -708,8 +688,8 @@ static struct iommu_ops msm_iommu_ops = {
 */
.iotlb_sync = NULL,
.iova_to_phys = msm_iommu_iova_to_phys,
-   .add_device = msm_iommu_add_device,
-   .remove_device = msm_iommu_remove_device,
+   .probe_device = msm_iommu_probe_device,
+   .release_device = msm_iommu_release_device,
.device_group = generic_device_group,
.pgsize_bitmap = MSM_IOMMU_PGSIZES,
.of_xlate = qcom_iommu_of_xlate,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 08/34] iommu: Move default domain allocation to iommu_probe_device()

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Well, not really. The call to iommu_alloc_default_domain() in
iommu_group_get_for_dev() has to stay around as long as there are
IOMMU drivers using the add/remove_device() call-backs instead of
probe/release_device().

Those drivers expect that iommu_group_get_for_dev() returns the device
attached to a group and the group set up with a default domain (and
the device attached to the groups current domain).

But when all drivers are converted this compatability mess can be
removed.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 102 +-
 1 file changed, 71 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6cfe7799dc8c..7a385c18e1a5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -79,6 +79,16 @@ static bool iommu_cmd_line_dma_api(void)
return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API);
 }
 
+static int iommu_alloc_default_domain(struct device *dev);
+static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
+unsigned type);
+static int __iommu_attach_device(struct iommu_domain *domain,
+struct device *dev);
+static int __iommu_attach_group(struct iommu_domain *domain,
+   struct iommu_group *group);
+static void __iommu_detach_group(struct iommu_domain *domain,
+struct iommu_group *group);
+
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
__ATTR(_name, _mode, _show, _store)
@@ -221,10 +231,29 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   if (ops->probe_device)
+   if (ops->probe_device) {
+   struct iommu_group *group;
+
ret = __iommu_probe_device(dev);
-   else
+
+   /*
+* Try to allocate a default domain - needs support from the
+* IOMMU driver. There are still some drivers which don't
+* support default domains, so the return value is not yet
+* checked.
+*/
+   if (!ret)
+   iommu_alloc_default_domain(dev);
+
+   group = iommu_group_get(dev);
+   if (group && group->default_domain) {
+   ret = __iommu_attach_device(group->default_domain, dev);
+   iommu_group_put(group);
+   }
+
+   } else {
ret = ops->add_device(dev);
+   }
 
if (ret)
goto err_module_put;
@@ -268,15 +297,6 @@ void iommu_release_device(struct device *dev)
dev_iommu_free(dev);
 }
 
-static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
-unsigned type);
-static int __iommu_attach_device(struct iommu_domain *domain,
-struct device *dev);
-static int __iommu_attach_group(struct iommu_domain *domain,
-   struct iommu_group *group);
-static void __iommu_detach_group(struct iommu_domain *domain,
-struct iommu_group *group);
-
 static int __init iommu_set_def_domain_type(char *str)
 {
bool pt;
@@ -1423,25 +1443,18 @@ static int iommu_get_def_domain_type(struct device *dev)
return (type == 0) ? iommu_def_domain_type : type;
 }
 
-static int iommu_alloc_default_domain(struct device *dev,
- struct iommu_group *group)
+static int iommu_group_alloc_default_domain(struct bus_type *bus,
+   struct iommu_group *group,
+   unsigned int type)
 {
struct iommu_domain *dom;
-   unsigned int type;
-
-   if (group->default_domain)
-   return 0;
 
-   type = iommu_get_def_domain_type(dev);
-
-   dom = __iommu_domain_alloc(dev->bus, type);
+   dom = __iommu_domain_alloc(bus, type);
if (!dom && type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
-   if (dom) {
-   dev_warn(dev,
-"failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
-type);
-   }
+   dom = __iommu_domain_alloc(bus, IOMMU_DOMAIN_DMA);
+   if (dom)
+   pr_warn("Failed to allocate default IOMMU domain of 
type %u for group %s - Falling back to IOMMU_DOMAIN_DMA",
+   type, group->name);
}
 
if (!dom)
@@ -1461,6 +1474,23 @@ static int iommu_alloc_default_domain(struct device *dev,
return 0;
 }
 
+static int iommu_alloc_default_domain(struct device *dev)

[RFC PATCH 28/34] iommu/renesas: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Renesas IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/ipmmu-vmsa.c | 60 +-
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 310cf09feea3..fb7e702dee23 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -805,24 +805,8 @@ static int ipmmu_of_xlate(struct device *dev,
 static int ipmmu_init_arm_mapping(struct device *dev)
 {
struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-   struct iommu_group *group;
int ret;
 
-   /* Create a device group and add the device to it. */
-   group = iommu_group_alloc();
-   if (IS_ERR(group)) {
-   dev_err(dev, "Failed to allocate IOMMU group\n");
-   return PTR_ERR(group);
-   }
-
-   ret = iommu_group_add_device(group, dev);
-   iommu_group_put(group);
-
-   if (ret < 0) {
-   dev_err(dev, "Failed to add device to IPMMU group\n");
-   return ret;
-   }
-
/*
 * Create the ARM mapping, used by the ARM DMA mapping core to allocate
 * VAs. This will allocate a corresponding IOMMU domain.
@@ -856,48 +840,39 @@ static int ipmmu_init_arm_mapping(struct device *dev)
return 0;
 
 error:
-   iommu_group_remove_device(dev);
if (mmu->mapping)
arm_iommu_release_mapping(mmu->mapping);
 
return ret;
 }
 
-static int ipmmu_add_device(struct device *dev)
+static struct iommu_device *ipmmu_probe_device(struct device *dev)
 {
struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-   struct iommu_group *group;
-   int ret;
 
/*
 * Only let through devices that have been verified in xlate()
 */
if (!mmu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
-   if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)) {
-   ret = ipmmu_init_arm_mapping(dev);
-   if (ret)
-   return ret;
-   } else {
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   return >iommu;
+}
 
-   iommu_group_put(group);
-   }
+static void ipmmu_probe_finalize(struct device *dev)
+{
+   int ret = 0;
 
-   iommu_device_link(>iommu, dev);
-   return 0;
+   if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA))
+   ret = ipmmu_init_arm_mapping(dev);
+
+   if (ret)
+   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
work\n");
 }
 
-static void ipmmu_remove_device(struct device *dev)
+static void ipmmu_release_device(struct device *dev)
 {
-   struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-
-   iommu_device_unlink(>iommu, dev);
arm_iommu_detach_device(dev);
-   iommu_group_remove_device(dev);
 }
 
 static struct iommu_group *ipmmu_find_group(struct device *dev)
@@ -925,9 +900,14 @@ static const struct iommu_ops ipmmu_ops = {
.flush_iotlb_all = ipmmu_flush_iotlb_all,
.iotlb_sync = ipmmu_iotlb_sync,
.iova_to_phys = ipmmu_iova_to_phys,
-   .add_device = ipmmu_add_device,
-   .remove_device = ipmmu_remove_device,
+   .probe_device = ipmmu_probe_device,
+   .release_device = ipmmu_release_device,
+   .probe_finalize = ipmmu_probe_finalize,
+#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
+   .device_group = generic_device_group,
+#else
.device_group = ipmmu_find_group,
+#endif
.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
.of_xlate = ipmmu_of_xlate,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 15/34] iommu/amd: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the AMD IOMMU Driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 71 ---
 1 file changed, 22 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0b4b4faa876d..c30367413683 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -343,21 +343,9 @@ static bool check_device(struct device *dev)
return true;
 }
 
-static void init_iommu_group(struct device *dev)
-{
-   struct iommu_group *group;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return;
-
-   iommu_group_put(group);
-}
-
 static int iommu_init_device(struct device *dev)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu;
int devid;
 
if (dev->archdata.iommu)
@@ -367,8 +355,6 @@ static int iommu_init_device(struct device *dev)
if (devid < 0)
return devid;
 
-   iommu = amd_iommu_rlookup_table[devid];
-
dev_data = find_dev_data(devid);
if (!dev_data)
return -ENOMEM;
@@ -391,8 +377,6 @@ static int iommu_init_device(struct device *dev)
 
dev->archdata.iommu = dev_data;
 
-   iommu_device_link(>iommu, dev);
-
return 0;
 }
 
@@ -410,7 +394,7 @@ static void iommu_ignore_device(struct device *dev)
setup_aliases(dev);
 }
 
-static void iommu_uninit_device(struct device *dev)
+static void amd_iommu_uninit_device(struct device *dev)
 {
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu;
@@ -429,13 +413,6 @@ static void iommu_uninit_device(struct device *dev)
if (dev_data->domain)
detach_device(dev);
 
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
-
-   /* Remove dma-ops */
-   dev->dma_ops = NULL;
-
/*
 * We keep dev_data around for unplugged devices and reuse it when the
 * device is re-plugged - not doing so would introduce a ton of races.
@@ -2152,55 +2129,50 @@ static void detach_device(struct device *dev)
spin_unlock_irqrestore(>lock, flags);
 }
 
-static int amd_iommu_add_device(struct device *dev)
+static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 {
-   struct iommu_dev_data *dev_data;
-   struct iommu_domain *domain;
+   struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
int ret, devid;
 
-   if (get_dev_data(dev))
-   return 0;
-
if (!check_device(dev))
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
devid = get_device_id(dev);
if (devid < 0)
-   return devid;
+   return ERR_PTR(devid);
 
iommu = amd_iommu_rlookup_table[devid];
 
+   if (get_dev_data(dev))
+   return >iommu;
+
ret = iommu_init_device(dev);
if (ret) {
if (ret != -ENOTSUPP)
dev_err(dev, "Failed to initialize - trying to proceed 
anyway\n");
-
+   iommu_dev = ERR_PTR(ret);
iommu_ignore_device(dev);
-   dev->dma_ops = NULL;
-   goto out;
+   } else {
+   iommu_dev = >iommu;
}
-   init_iommu_group(dev);
 
-   dev_data = get_dev_data(dev);
+   iommu_completion_wait(iommu);
 
-   BUG_ON(!dev_data);
+   return iommu_dev;
+}
 
-   if (dev_data->iommu_v2)
-   iommu_request_dm_for_dev(dev);
+static void amd_iommu_probe_finalize(struct device *dev)
+{
+   struct iommu_domain *domain;
 
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
-
-out:
-   iommu_completion_wait(iommu);
-
-   return 0;
 }
 
-static void amd_iommu_remove_device(struct device *dev)
+static void amd_iommu_release_device(struct device *dev)
 {
struct amd_iommu *iommu;
int devid;
@@ -2214,7 +2186,7 @@ static void amd_iommu_remove_device(struct device *dev)
 
iommu = amd_iommu_rlookup_table[devid];
 
-   iommu_uninit_device(dev);
+   amd_iommu_uninit_device(dev);
iommu_completion_wait(iommu);
 }
 
@@ -2687,8 +2659,9 @@ const struct iommu_ops amd_iommu_ops = {
.map = amd_iommu_map,
.unmap = amd_iommu_unmap,
.iova_to_phys = amd_iommu_iova_to_phys,
-   .add_device = amd_iommu_add_device,
-   .remove_device = amd_iommu_remove_device,
+   .probe_device = amd_iommu_probe_device,
+   .release_device = amd_iommu_release_device,
+   .probe_finalize = amd_iommu_probe_finalize,
.device_group = 

[RFC PATCH 32/34] iommu/exynos: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Exynos IOMMU drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 86ecccbf0438..f865c9093722 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1223,19 +1223,13 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *iommu_domain,
return phys;
 }
 
-static int exynos_iommu_add_device(struct device *dev)
+static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
 {
struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct sysmmu_drvdata *data;
-   struct iommu_group *group;
 
if (!has_sysmmu(dev))
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   return ERR_PTR(-ENODEV);
 
list_for_each_entry(data, >controllers, owner_node) {
/*
@@ -1247,14 +1241,11 @@ static int exynos_iommu_add_device(struct device *dev)
 DL_FLAG_STATELESS |
 DL_FLAG_PM_RUNTIME);
}
-   iommu_group_put(group);
 
-   iommu_device_link(>iommu, dev);
-
-   return 0;
+   return >iommu;
 }
 
-static void exynos_iommu_remove_device(struct device *dev)
+static void exynos_iommu_release_device(struct device *dev)
 {
struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct sysmmu_drvdata *data;
@@ -1272,8 +1263,6 @@ static void exynos_iommu_remove_device(struct device *dev)
iommu_group_put(group);
}
}
-   iommu_device_unlink(>iommu, dev);
-   iommu_group_remove_device(dev);
 
list_for_each_entry(data, >controllers, owner_node)
device_link_del(data->link);
@@ -1381,8 +1370,8 @@ static const struct iommu_ops exynos_iommu_ops = {
.unmap = exynos_iommu_unmap,
.iova_to_phys = exynos_iommu_iova_to_phys,
.device_group = generic_device_group,
-   .add_device = exynos_iommu_add_device,
-   .remove_device = exynos_iommu_remove_device,
+   .probe_device = exynos_iommu_probe_device,
+   .release_device = exynos_iommu_release_device,
.pgsize_bitmap = SECT_SIZE | LPAGE_SIZE | SPAGE_SIZE,
.of_xlate = exynos_iommu_of_xlate,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 24/34] iommu/mediatek-v1 Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Mediatek-v1 IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu_v1.c | 50 +++-
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index a31be05601c9..7bdd74c7cb9f 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -416,14 +416,12 @@ static int mtk_iommu_create_mapping(struct device *dev,
return 0;
 }
 
-static int mtk_iommu_add_device(struct device *dev)
+static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct dma_iommu_mapping *mtk_mapping;
struct of_phandle_args iommu_spec;
struct of_phandle_iterator it;
struct mtk_iommu_data *data;
-   struct iommu_group *group;
int err;
 
of_for_each_phandle(, err, dev->of_node, "iommus",
@@ -442,35 +440,28 @@ static int mtk_iommu_add_device(struct device *dev)
}
 
if (!fwspec || fwspec->ops != _iommu_ops)
-   return -ENODEV; /* Not a iommu client device */
+   return ERR_PTR(-ENODEV); /* Not a iommu client device */
 
-   /*
-* This is a short-term bodge because the ARM DMA code doesn't
-* understand multi-device groups, but we have to call into it
-* successfully (and not just rely on a normal IOMMU API attach
-* here) in order to set the correct DMA API ops on @dev.
-*/
-   group = iommu_group_alloc();
-   if (IS_ERR(group))
-   return PTR_ERR(group);
+   data = dev_iommu_priv_get(dev);
 
-   err = iommu_group_add_device(group, dev);
-   iommu_group_put(group);
-   if (err)
-   return err;
+   return >iommu;
+}
 
-   data = dev_iommu_priv_get(dev);
+static void mtk_iommu_probe_finalize(struct device *dev)
+{
+   struct dma_iommu_mapping *mtk_mapping;
+   struct mtk_iommu_data *data;
+   int err;
+
+   data= dev_iommu_priv_get(dev);
mtk_mapping = data->dev->archdata.iommu;
-   err = arm_iommu_attach_device(dev, mtk_mapping);
-   if (err) {
-   iommu_group_remove_device(dev);
-   return err;
-   }
 
-   return iommu_device_link(>iommu, dev);
+   err = arm_iommu_attach_device(dev, mtk_mapping);
+   if (err)
+   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
work\n");
 }
 
-static void mtk_iommu_remove_device(struct device *dev)
+static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
@@ -479,9 +470,6 @@ static void mtk_iommu_remove_device(struct device *dev)
return;
 
data = dev_iommu_priv_get(dev);
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -534,8 +522,10 @@ static const struct iommu_ops mtk_iommu_ops = {
.map= mtk_iommu_map,
.unmap  = mtk_iommu_unmap,
.iova_to_phys   = mtk_iommu_iova_to_phys,
-   .add_device = mtk_iommu_add_device,
-   .remove_device  = mtk_iommu_remove_device,
+   .probe_device   = mtk_iommu_probe_device,
+   .probe_finalize = mtk_iommu_probe_finalize,
+   .release_device = mtk_iommu_release_device,
+   .device_group   = generic_device_group,
.pgsize_bitmap  = ~0UL << MT2701_IOMMU_PAGE_SHIFT,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 07/34] iommu: Add probe_device() and remove_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Add call-backs to 'struct iommu_ops' as an alternative to the
add_device() and remove_device() call-backs, which will be removed when
all drivers are converted.

The new call-backs will not setupt IOMMU groups and domains anymore,
so also add a probe_finalize() call-back where the IOMMU driver can do
per-device setup work which require the device to be set up with a
group and a domain.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 63 ++-
 include/linux/iommu.h |  9 +++
 2 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5877abd9b693..6cfe7799dc8c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -174,6 +174,36 @@ static void dev_iommu_free(struct device *dev)
dev->iommu = NULL;
 }
 
+static int __iommu_probe_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_device *iommu_dev;
+   struct iommu_group *group;
+   int ret;
+
+   iommu_dev = ops->probe_device(dev);
+   if (IS_ERR(iommu_dev))
+   return PTR_ERR(iommu_dev);
+
+   dev->iommu->iommu_dev = iommu_dev;
+
+   group = iommu_group_get_for_dev(dev);
+   if (!IS_ERR(group)) {
+   ret = PTR_ERR(group);
+   goto out_release;
+   }
+   iommu_group_put(group);
+
+   iommu_device_link(iommu_dev, dev);
+
+   return 0;
+
+out_release:
+   ops->release_device(dev);
+
+   return ret;
+}
+
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -191,10 +221,17 @@ int iommu_probe_device(struct device *dev)
goto err_free_dev_param;
}
 
-   ret = ops->add_device(dev);
+   if (ops->probe_device)
+   ret = __iommu_probe_device(dev);
+   else
+   ret = ops->add_device(dev);
+
if (ret)
goto err_module_put;
 
+   if (ops->probe_finalize)
+   ops->probe_finalize(dev);
+
return 0;
 
 err_module_put:
@@ -204,17 +241,31 @@ int iommu_probe_device(struct device *dev)
return ret;
 }
 
+static void __iommu_release_device(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   iommu_device_unlink(dev->iommu->iommu_dev, dev);
+
+   iommu_group_remove_device(dev);
+
+   ops->release_device(dev);
+}
+
 void iommu_release_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
 
-   if (dev->iommu_group)
+   if (!dev->iommu)
+   return;
+
+   if (ops->release_device)
+   __iommu_release_device(dev);
+   else if (dev->iommu_group)
ops->remove_device(dev);
 
-   if (dev->iommu) {
-   module_put(ops->owner);
-   dev_iommu_free(dev);
-   }
+   module_put(ops->owner);
+   dev_iommu_free(dev);
 }
 
 static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1f027b07e499..30170d191e5e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -225,6 +225,10 @@ struct iommu_iotlb_gather {
  * @iova_to_phys: translate iova to physical address
  * @add_device: add device to iommu grouping
  * @remove_device: remove device from iommu grouping
+ * @probe_device: Add device to iommu driver handling
+ * @release_device: Remove device from iommu driver handling
+ * @probe_finalize: Do final setup work after the device is added to an IOMMU
+ *  group and attached to the groups domain
  * @device_group: find iommu group for a particular device
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
@@ -275,6 +279,9 @@ struct iommu_ops {
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);
+   struct iommu_device *(*probe_device)(struct device *dev);
+   void (*release_device)(struct device *dev);
+   void (*probe_finalize)(struct device *dev);
struct iommu_group *(*device_group)(struct device *dev);
int (*domain_get_attr)(struct iommu_domain *domain,
   enum iommu_attr attr, void *data);
@@ -375,6 +382,7 @@ struct iommu_fault_param {
  *
  * @fault_param: IOMMU detected device fault reporting data
  * @fwspec: IOMMU fwspec data
+ * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
@@ -384,6 +392,7 @@ struct dev_iommu {
struct mutex lock;
struct iommu_fault_param*fault_param;
struct iommu_fwspec *fwspec;
+   struct iommu_device *iommu_dev;
void

[RFC PATCH 33/34] iommu: Remove add_device()/remove_device() code-paths

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

All drivers are converted to use the probe/release_device()
call-backs, so the add_device/remove_device() pointers are unused and
the code using them can be removed.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 145 --
 include/linux/iommu.h |   4 --
 2 files changed, 27 insertions(+), 122 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf25c1e48830..d9032f9d597c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -220,7 +220,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
return ret;
 }
 
-static int __iommu_probe_device_helper(struct device *dev)
+int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;
@@ -264,70 +264,17 @@ static int __iommu_probe_device_helper(struct device *dev)
 
 }
 
-int iommu_probe_device(struct device *dev)
+void iommu_release_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
-   struct iommu_group *group;
-   int ret;
-
-   WARN_ON(dev->iommu_group);
-
-   if (!ops)
-   return -EINVAL;
-
-   if (!dev_iommu_get(dev))
-   return -ENOMEM;
-
-   if (!try_module_get(ops->owner)) {
-   ret = -EINVAL;
-   goto err_free_dev_param;
-   }
-
-   if (ops->probe_device)
-   return __iommu_probe_device_helper(dev);
-
-   ret = ops->add_device(dev);
-   if (ret)
-   goto err_module_put;
 
-   group = iommu_group_get(dev);
-   iommu_create_device_direct_mappings(group, dev);
-   iommu_group_put(group);
-
-   if (ops->probe_finalize)
-   ops->probe_finalize(dev);
-
-   return 0;
-
-err_module_put:
-   module_put(ops->owner);
-err_free_dev_param:
-   dev_iommu_free(dev);
-   return ret;
-}
-
-static void __iommu_release_device(struct device *dev)
-{
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   if (!dev->iommu)
+   return;
 
iommu_device_unlink(dev->iommu->iommu_dev, dev);
-
iommu_group_remove_device(dev);
 
ops->release_device(dev);
-}
-
-void iommu_release_device(struct device *dev)
-{
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
-
-   if (!dev->iommu)
-   return;
-
-   if (ops->release_device)
-   __iommu_release_device(dev);
-   else if (dev->iommu_group)
-   ops->remove_device(dev);
 
module_put(ops->owner);
dev_iommu_free(dev);
@@ -1560,23 +1507,6 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
if (ret)
goto out_put_group;
 
-   /*
-* Try to allocate a default domain - needs support from the
-* IOMMU driver. There are still some drivers which don't support
-* default domains, so the return value is not yet checked. Only
-* allocate the domain here when the driver still has the
-* add_device/remove_device call-backs implemented.
-*/
-   if (!ops->probe_device) {
-   iommu_alloc_default_domain(dev);
-
-   if (group->default_domain)
-   ret = __iommu_attach_device(group->default_domain, dev);
-
-   if (ret)
-   goto out_put_group;
-   }
-
return group;
 
 out_put_group:
@@ -1591,21 +1521,6 @@ struct iommu_domain *iommu_group_default_domain(struct 
iommu_group *group)
return group->default_domain;
 }
 
-static int add_iommu_group(struct device *dev, void *data)
-{
-   int ret = iommu_probe_device(dev);
-
-   /*
-* We ignore -ENODEV errors for now, as they just mean that the
-* device is not translated by an IOMMU. We still care about
-* other errors and fail to initialize when they happen.
-*/
-   if (ret == -ENODEV)
-   ret = 0;
-
-   return ret;
-}
-
 static int probe_iommu_group(struct device *dev, void *data)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -1789,45 +1704,39 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group)
 
 int bus_iommu_probe(struct bus_type *bus)
 {
-   const struct iommu_ops *ops = bus->iommu_ops;
+   struct iommu_group *group, *next;
+   LIST_HEAD(group_list);
int ret;
 
-   if (ops->probe_device) {
-   struct iommu_group *group, *next;
-   LIST_HEAD(group_list);
-
-   /*
-* This code-path does not allocate the default domain when
-* creating the iommu group, so do it after the groups are
-* created.
-*/
-   ret = bus_for_each_dev(bus, NULL, _list, 
probe_iommu_group);
-   if (ret)
-   return ret;
+   /*
+* This code-path does 

[RFC PATCH 29/34] iommu/omap: Remove orphan_dev tracking

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Remove the tracking of device which could not be probed because
their IOMMU is not probed yet. Replace it with a call to
bus_iommu_probe() when a new IOMMU is probed.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/omap-iommu.c | 54 +++---
 1 file changed, 4 insertions(+), 50 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 887fefcb03b4..ecc9d0829a91 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -35,15 +35,6 @@
 
 static const struct iommu_ops omap_iommu_ops;
 
-struct orphan_dev {
-   struct device *dev;
-   struct list_head node;
-};
-
-static LIST_HEAD(orphan_dev_list);
-
-static DEFINE_SPINLOCK(orphan_lock);
-
 #define to_iommu(dev)  ((struct omap_iommu *)dev_get_drvdata(dev))
 
 /* bitmap of the page sizes currently supported */
@@ -62,8 +53,6 @@ static DEFINE_SPINLOCK(orphan_lock);
 static struct platform_driver omap_iommu_driver;
 static struct kmem_cache *iopte_cachep;
 
-static int _omap_iommu_add_device(struct device *dev);
-
 /**
  * to_omap_domain - Get struct omap_iommu_domain from generic iommu_domain
  * @dom:   generic iommu domain handle
@@ -1177,7 +1166,6 @@ static int omap_iommu_probe(struct platform_device *pdev)
struct omap_iommu *obj;
struct resource *res;
struct device_node *of = pdev->dev.of_node;
-   struct orphan_dev *orphan_dev, *tmp;
 
if (!of) {
pr_err("%s: only DT-based devices are supported\n", __func__);
@@ -1260,13 +1248,8 @@ static int omap_iommu_probe(struct platform_device *pdev)
 
dev_info(>dev, "%s registered\n", obj->name);
 
-   list_for_each_entry_safe(orphan_dev, tmp, _dev_list, node) {
-   err = _omap_iommu_add_device(orphan_dev->dev);
-   if (!err) {
-   list_del(_dev->node);
-   kfree(orphan_dev);
-   }
-   }
+   /* Re-probe bus to probe device attached to this IOMMU */
+   bus_iommu_probe(_bus_type);
 
return 0;
 
@@ -1657,7 +1640,7 @@ static phys_addr_t omap_iommu_iova_to_phys(struct 
iommu_domain *domain,
return ret;
 }
 
-static int _omap_iommu_add_device(struct device *dev)
+static int omap_iommu_add_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data, *tmp;
struct omap_iommu *oiommu;
@@ -1666,8 +1649,6 @@ static int _omap_iommu_add_device(struct device *dev)
struct platform_device *pdev;
int num_iommus, i;
int ret;
-   struct orphan_dev *orphan_dev;
-   unsigned long flags;
 
/*
 * Allocate the archdata iommu structure for DT-based devices.
@@ -1702,23 +1683,7 @@ static int _omap_iommu_add_device(struct device *dev)
if (!pdev) {
of_node_put(np);
kfree(arch_data);
-   spin_lock_irqsave(_lock, flags);
-   list_for_each_entry(orphan_dev, _dev_list,
-   node) {
-   if (orphan_dev->dev == dev)
-   break;
-   }
-   spin_unlock_irqrestore(_lock, flags);
-
-   if (orphan_dev && orphan_dev->dev == dev)
-   return -EPROBE_DEFER;
-
-   orphan_dev = kzalloc(sizeof(*orphan_dev), GFP_KERNEL);
-   orphan_dev->dev = dev;
-   spin_lock_irqsave(_lock, flags);
-   list_add(_dev->node, _dev_list);
-   spin_unlock_irqrestore(_lock, flags);
-   return -EPROBE_DEFER;
+   return -ENODEV;
}
 
oiommu = platform_get_drvdata(pdev);
@@ -1764,17 +1729,6 @@ static int _omap_iommu_add_device(struct device *dev)
return 0;
 }
 
-static int omap_iommu_add_device(struct device *dev)
-{
-   int ret;
-
-   ret = _omap_iommu_add_device(dev);
-   if (ret == -EPROBE_DEFER)
-   return 0;
-
-   return ret;
-}
-
 static void omap_iommu_remove_device(struct device *dev)
 {
struct omap_iommu_arch_data *arch_data = dev->archdata.iommu;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 06/34] iommu/amd: Return -ENODEV in add_device when device is not handled by IOMMU

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

When check_device() fails on the device, it is not handled by the
IOMMU and amd_iommu_add_device() needs to return -ENODEV.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 504f2db75eda..3e0d27f7622e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2157,9 +2157,12 @@ static int amd_iommu_add_device(struct device *dev)
struct amd_iommu *iommu;
int ret, devid;
 
-   if (!check_device(dev) || get_dev_data(dev))
+   if (get_dev_data(dev))
return 0;
 
+   if (!check_device(dev))
+   return -ENODEV;
+
devid = get_device_id(dev);
if (devid < 0)
return devid;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 20/34] iommu/s390: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the S390 IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/s390-iommu.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 1137f3ddcb85..610f0828f22d 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -166,21 +166,14 @@ static void s390_iommu_detach_device(struct iommu_domain 
*domain,
}
 }
 
-static int s390_iommu_add_device(struct device *dev)
+static struct iommu_device *s390_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group = iommu_group_get_for_dev(dev);
struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
 
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   iommu_device_link(>iommu_dev, dev);
-
-   return 0;
+   return >iommu_dev;
 }
 
-static void s390_iommu_remove_device(struct device *dev)
+static void s390_iommu_release_device(struct device *dev)
 {
struct zpci_dev *zdev = to_pci_dev(dev)->sysdata;
struct iommu_domain *domain;
@@ -191,7 +184,7 @@ static void s390_iommu_remove_device(struct device *dev)
 * to vfio-pci and completing the VFIO_SET_IOMMU ioctl (which triggers
 * the attach_dev), removing the device via
 * "echo 1 > /sys/bus/pci/devices/.../remove" won't trigger detach_dev,
-* only remove_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
+* only release_device will be called via the BUS_NOTIFY_REMOVED_DEVICE
 * notifier.
 *
 * So let's call detach_dev from here if it hasn't been called before.
@@ -201,9 +194,6 @@ static void s390_iommu_remove_device(struct device *dev)
if (domain)
s390_iommu_detach_device(domain, dev);
}
-
-   iommu_device_unlink(>iommu_dev, dev);
-   iommu_group_remove_device(dev);
 }
 
 static int s390_iommu_update_trans(struct s390_domain *s390_domain,
@@ -373,8 +363,8 @@ static const struct iommu_ops s390_iommu_ops = {
.map = s390_iommu_map,
.unmap = s390_iommu_unmap,
.iova_to_phys = s390_iommu_iova_to_phys,
-   .add_device = s390_iommu_add_device,
-   .remove_device = s390_iommu_remove_device,
+   .probe_device = s390_iommu_probe_device,
+   .release_device = s390_iommu_release_device,
.device_group = generic_device_group,
.pgsize_bitmap = S390_IOMMU_PGSIZES,
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 11/34] iommu: Split off default domain allocation from group assignment

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

When a bus is initialized with iommu-ops, all devices on the bus are
scanned and iommu-groups are allocated for them, and each groups will
also get a default domain allocated.

Until now this happened as soon as the group was created and the first
device added to it. When other devices with different default domain
requirements were added to the group later on, the default domain was
re-allocated, if possible.

This resulted in some back and forth and unnecessary allocations, so
change the flow to defer default domain allocation until all devices
have been added to their respective IOMMU groups.

The default domains are allocated for newly allocated groups after
each device on the bus is handled and was probed by the IOMMU driver.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 152 +-
 1 file changed, 149 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8be047a4808f..44514e3e8ca2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -199,7 +199,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
dev->iommu->iommu_dev = iommu_dev;
 
group = iommu_group_get_for_dev(dev);
-   if (!IS_ERR(group)) {
+   if (IS_ERR(group)) {
ret = PTR_ERR(group);
goto out_release;
}
@@ -1599,6 +1599,37 @@ static int add_iommu_group(struct device *dev, void 
*data)
return ret;
 }
 
+static int probe_iommu_group(struct device *dev, void *data)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct list_head *group_list = data;
+   int ret;
+
+   if (!dev_iommu_get(dev))
+   return -ENOMEM;
+
+   if (!try_module_get(ops->owner)) {
+   ret = -EINVAL;
+   goto err_free_dev_iommu;
+   }
+
+   ret = __iommu_probe_device(dev, group_list);
+   if (ret)
+   goto err_module_put;
+
+   return 0;
+
+err_module_put:
+   module_put(ops->owner);
+err_free_dev_iommu:
+   dev_iommu_free(dev);
+
+   if (ret == -ENODEV)
+   ret = 0;
+
+   return ret;
+}
+
 static int remove_iommu_group(struct device *dev, void *data)
 {
iommu_release_device(dev);
@@ -1658,10 +1689,125 @@ static int iommu_bus_notifier(struct notifier_block 
*nb,
return 0;
 }
 
+struct __group_domain_type {
+   struct device *dev;
+   unsigned int type;
+};
+
+static int probe_get_default_domain_type(struct device *dev, void *data)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct __group_domain_type *gtype = data;
+   unsigned int type = 0;
+
+   if (ops->def_domain_type)
+   type = ops->def_domain_type(dev);
+
+   if (type) {
+   if (gtype->type && gtype->type != type) {
+   dev_warn(dev, "Device needs domain type %s, but device 
%s in the same iommu group requires type %s - using default\n",
+iommu_domain_type_str(type),
+dev_name(gtype->dev),
+iommu_domain_type_str(gtype->type));
+   gtype->type = 0;
+   }
+
+   if (!gtype->dev) {
+   gtype->dev  = dev;
+   gtype->type = type;
+   }
+   }
+
+   return 0;
+}
+
+static void probe_alloc_default_domain(struct bus_type *bus,
+  struct iommu_group *group)
+{
+   struct __group_domain_type gtype;
+
+   memset(, 0, sizeof(gtype));
+
+   /* Ask for default domain requirements of all devices in the group */
+   __iommu_group_for_each_dev(group, ,
+  probe_get_default_domain_type);
+
+   if (!gtype.type)
+   gtype.type = iommu_def_domain_type;
+
+   iommu_group_alloc_default_domain(bus, group, gtype.type);
+}
+
+static int iommu_group_do_dma_attach(struct device *dev, void *data)
+{
+   struct iommu_domain *domain = data;
+   const struct iommu_ops *ops;
+   int ret;
+
+   ret = __iommu_attach_device(domain, dev);
+
+   ops = domain->ops;
+
+   if (ret == 0 && ops->probe_finalize)
+   ops->probe_finalize(dev);
+
+   return ret;
+}
+
+static int __iommu_group_dma_attach(struct iommu_group *group)
+{
+   return __iommu_group_for_each_dev(group, group->default_domain,
+ iommu_group_do_dma_attach);
+}
+
+static int bus_iommu_probe(struct bus_type *bus)
+{
+   const struct iommu_ops *ops = bus->iommu_ops;
+   int ret;
+
+   if (ops->probe_device) {
+   struct iommu_group *group, *next;
+   LIST_HEAD(group_list);
+
+   /*
+* This code-path does not allocate the default domain when
+* creating the iommu group, so do it after 

[RFC PATCH 12/34] iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device()

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

After the previous changes the iommu group may not have a default
domain when iommu_group_add_device() is called. With no default domain
iommu_group_create_direct_mappings() will do nothing and no direct
mappings will be created.

Rename iommu_group_create_direct_mappings() to
iommu_create_device_direct_mappings() to better reflect that the
function creates direct mappings only for one device and not for all
devices in the group. Then move the call to the places where a default
domain actually exists.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 44514e3e8ca2..844613850595 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -89,6 +89,8 @@ static int __iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
 static void __iommu_detach_group(struct iommu_domain *domain,
 struct iommu_group *group);
+static int iommu_create_device_direct_mappings(struct iommu_group *group,
+  struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -243,6 +245,8 @@ static int __iommu_probe_device_helper(struct device *dev)
if (group->default_domain)
ret = __iommu_attach_device(group->default_domain, dev);
 
+   iommu_create_device_direct_mappings(group, dev);
+
iommu_group_put(group);
 
if (ret)
@@ -263,6 +267,7 @@ static int __iommu_probe_device_helper(struct device *dev)
 int iommu_probe_device(struct device *dev)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
+   struct iommu_group *group;
int ret;
 
WARN_ON(dev->iommu_group);
@@ -285,6 +290,10 @@ int iommu_probe_device(struct device *dev)
if (ret)
goto err_module_put;
 
+   group = iommu_group_get(dev);
+   iommu_create_device_direct_mappings(group, dev);
+   iommu_group_put(group);
+
if (ops->probe_finalize)
ops->probe_finalize(dev);
 
@@ -736,8 +745,8 @@ int iommu_group_set_name(struct iommu_group *group, const 
char *name)
 }
 EXPORT_SYMBOL_GPL(iommu_group_set_name);
 
-static int iommu_group_create_direct_mappings(struct iommu_group *group,
- struct device *dev)
+static int iommu_create_device_direct_mappings(struct iommu_group *group,
+  struct device *dev)
 {
struct iommu_domain *domain = group->default_domain;
struct iommu_resv_region *entry;
@@ -841,8 +850,6 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
 
dev->iommu_group = group;
 
-   iommu_group_create_direct_mappings(group, dev);
-
mutex_lock(>mutex);
list_add_tail(>list, >devices);
if (group->domain)
@@ -1736,6 +1743,7 @@ static void probe_alloc_default_domain(struct bus_type 
*bus,
gtype.type = iommu_def_domain_type;
 
iommu_group_alloc_default_domain(bus, group, gtype.type);
+
 }
 
 static int iommu_group_do_dma_attach(struct device *dev, void *data)
@@ -1760,6 +1768,21 @@ static int __iommu_group_dma_attach(struct iommu_group 
*group)
  iommu_group_do_dma_attach);
 }
 
+static int iommu_do_create_direct_mappings(struct device *dev, void *data)
+{
+   struct iommu_group *group = data;
+
+   iommu_create_device_direct_mappings(group, dev);
+
+   return 0;
+}
+
+static int iommu_group_create_direct_mappings(struct iommu_group *group)
+{
+   return __iommu_group_for_each_dev(group, group,
+ iommu_do_create_direct_mappings);
+}
+
 static int bus_iommu_probe(struct bus_type *bus)
 {
const struct iommu_ops *ops = bus->iommu_ops;
@@ -1790,6 +1813,8 @@ static int bus_iommu_probe(struct bus_type *bus)
if (!group->default_domain)
continue;
 
+   iommu_group_create_direct_mappings(group);
+
ret = __iommu_group_dma_attach(group);
 
mutex_unlock(>mutex);
@@ -2630,7 +2655,7 @@ request_default_domain_for_dev(struct device *dev, 
unsigned long type)
iommu_domain_free(group->default_domain);
group->default_domain = domain;
 
-   iommu_group_create_direct_mappings(group, dev);
+   iommu_create_device_direct_mappings(group, dev);
 
dev_info(dev, "Using iommu %s mapping\n",
 type == IOMMU_DOMAIN_DMA ? "dma" : "direct");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 04/34] iommu/vt-d: Wire up iommu_ops->def_domain_type

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

The Intel VT-d driver already has a matching function to determine the
default domain type for a device. Wire it up in intel_iommu_ops.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b9f905a55dda 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6209,6 +6209,7 @@ const struct iommu_ops intel_iommu_ops = {
.dev_enable_feat= intel_iommu_dev_enable_feat,
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
+   .def_domain_type= device_def_domain_type,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 31/34] iommu/exynos: Create iommu_device in struct exynos_iommu_owner

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

The 'struct exynos_iommu_owner' is an umbrella for multiple SYSMMU
instances attached to one master. As such all these instances are
handled the same, they are all configured with the same iommu_domain,
for example.

The IOMMU core code expects each device to have only one IOMMU
attached, so create the IOMMU-device for the umbrella instead of each
hardware SYSMMU.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/exynos-iommu.c | 96 +++-
 1 file changed, 73 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 186ff5cc975c..86ecccbf0438 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -235,6 +235,8 @@ struct exynos_iommu_owner {
struct list_head controllers;   /* list of sysmmu_drvdata.owner_node */
struct iommu_domain *domain;/* domain this device is attached */
struct mutex rpm_lock;  /* for runtime pm of all sysmmus */
+
+   struct iommu_device iommu;  /* IOMMU core handle */
 };
 
 /*
@@ -274,8 +276,6 @@ struct sysmmu_drvdata {
struct list_head owner_node;/* node for owner controllers list */
phys_addr_t pgtable;/* assigned page table structure */
unsigned int version;   /* our version */
-
-   struct iommu_device iommu;  /* IOMMU core handle */
 };
 
 static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain *dom)
@@ -625,18 +625,6 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
data->sysmmu = dev;
spin_lock_init(>lock);
 
-   ret = iommu_device_sysfs_add(>iommu, >dev, NULL,
-dev_name(data->sysmmu));
-   if (ret)
-   return ret;
-
-   iommu_device_set_ops(>iommu, _iommu_ops);
-   iommu_device_set_fwnode(>iommu, >of_node->fwnode);
-
-   ret = iommu_device_register(>iommu);
-   if (ret)
-   return ret;
-
platform_set_drvdata(pdev, data);
 
__sysmmu_get_version(data);
@@ -1261,6 +1249,8 @@ static int exynos_iommu_add_device(struct device *dev)
}
iommu_group_put(group);
 
+   iommu_device_link(>iommu, dev);
+
return 0;
 }
 
@@ -1282,18 +1272,82 @@ static void exynos_iommu_remove_device(struct device 
*dev)
iommu_group_put(group);
}
}
+   iommu_device_unlink(>iommu, dev);
iommu_group_remove_device(dev);
 
list_for_each_entry(data, >controllers, owner_node)
device_link_del(data->link);
 }
 
+static int exynos_iommu_device_init(struct exynos_iommu_owner *owner)
+{
+   static u32 counter = 0;
+   int ret;
+
+   /*
+* Create a virtual IOMMU device. In reality it is an umbrella for a
+* number of SYSMMU platform devices, but that also means that any
+* master can have more than one real IOMMU device. This drivers handles
+* all the real devices for one master synchronously, so they appear as
+* one anyway.
+*/
+   ret = iommu_device_sysfs_add(>iommu, NULL, NULL,
+"sysmmu-owner-%d", counter++);
+   if (ret)
+   return ret;
+
+   iommu_device_set_ops(>iommu, _iommu_ops);
+
+   return 0;
+}
+
+static void exynos_iommu_device_remove(struct exynos_iommu_owner *owner)
+{
+   iommu_device_set_ops(>iommu, NULL);
+   iommu_device_sysfs_remove(>iommu);
+}
+
+static int exynos_owner_init(struct device *dev)
+{
+   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   int ret;
+
+   if (owner)
+   return 0;
+
+   owner = kzalloc(sizeof(*owner), GFP_KERNEL);
+   if (!owner)
+   return -ENOMEM;
+
+   ret = exynos_iommu_device_init(owner);
+   if (ret)
+   goto out_free_owner;
+
+   ret = iommu_device_register(>iommu);
+   if (ret)
+   goto out_remove_iommu_device;
+
+   INIT_LIST_HEAD(>controllers);
+   mutex_init(>rpm_lock);
+   dev->archdata.iommu = owner;
+
+   return 0;
+
+out_remove_iommu_device:
+   exynos_iommu_device_remove(owner);
+out_free_owner:
+   kfree(owner);
+
+   return ret;
+}
+
 static int exynos_iommu_of_xlate(struct device *dev,
 struct of_phandle_args *spec)
 {
-   struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct platform_device *sysmmu = of_find_device_by_node(spec->np);
struct sysmmu_drvdata *data, *entry;
+   struct exynos_iommu_owner *owner;
+   int ret;
 
if (!sysmmu)
return -ENODEV;
@@ -1302,15 +1356,11 @@ static int exynos_iommu_of_xlate(struct device *dev,
if (!data)
return -ENODEV;
 
-   if (!owner) {
-   owner = kzalloc(sizeof(*owner), GFP_KERNEL);
-   if (!owner)
-   return -ENOMEM;
+   

[RFC PATCH 27/34] iommu/tegra: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Tegra IOMMU drivers to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/tegra-gart.c | 24 ++--
 drivers/iommu/tegra-smmu.c | 31 ---
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index db6559e8336f..5fbdff6ff41a 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -243,28 +243,16 @@ static bool gart_iommu_capable(enum iommu_cap cap)
return false;
 }
 
-static int gart_iommu_add_device(struct device *dev)
+static struct iommu_device *gart_iommu_probe_device(struct device *dev)
 {
-   struct iommu_group *group;
-
if (!dev_iommu_fwspec_get(dev))
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   return ERR_PTR(-ENODEV);
 
-   iommu_device_link(_handle->iommu, dev);
-
-   return 0;
+   return _handle->iommu;
 }
 
-static void gart_iommu_remove_device(struct device *dev)
+static void gart_iommu_release_device(struct device *dev)
 {
-   iommu_group_remove_device(dev);
-   iommu_device_unlink(_handle->iommu, dev);
 }
 
 static int gart_iommu_of_xlate(struct device *dev,
@@ -290,8 +278,8 @@ static const struct iommu_ops gart_iommu_ops = {
.domain_free= gart_iommu_domain_free,
.attach_dev = gart_iommu_attach_dev,
.detach_dev = gart_iommu_detach_dev,
-   .add_device = gart_iommu_add_device,
-   .remove_device  = gart_iommu_remove_device,
+   .probe_device   = gart_iommu_probe_device,
+   .release_device = gart_iommu_release_device,
.device_group   = generic_device_group,
.map= gart_iommu_map,
.unmap  = gart_iommu_unmap,
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 63a147b623e6..7426b7666e2b 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -757,11 +757,10 @@ static int tegra_smmu_configure(struct tegra_smmu *smmu, 
struct device *dev,
return 0;
 }
 
-static int tegra_smmu_add_device(struct device *dev)
+static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
 {
struct device_node *np = dev->of_node;
struct tegra_smmu *smmu = NULL;
-   struct iommu_group *group;
struct of_phandle_args args;
unsigned int index = 0;
int err;
@@ -774,7 +773,7 @@ static int tegra_smmu_add_device(struct device *dev)
of_node_put(args.np);
 
if (err < 0)
-   return err;
+   return ERR_PTR(err);
 
/*
 * Only a single IOMMU master interface is currently
@@ -783,8 +782,6 @@ static int tegra_smmu_add_device(struct device *dev)
 */
dev->archdata.iommu = smmu;
 
-   iommu_device_link(>iommu, dev);
-
break;
}
 
@@ -793,26 +790,14 @@ static int tegra_smmu_add_device(struct device *dev)
}
 
if (!smmu)
-   return -ENODEV;
-
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
+   return ERR_PTR(-ENODEV);
 
-   return 0;
+   return >iommu;
 }
 
-static void tegra_smmu_remove_device(struct device *dev)
+static void tegra_smmu_release_device(struct device *dev)
 {
-   struct tegra_smmu *smmu = dev->archdata.iommu;
-
-   if (smmu)
-   iommu_device_unlink(>iommu, dev);
-
dev->archdata.iommu = NULL;
-   iommu_group_remove_device(dev);
 }
 
 static const struct tegra_smmu_group_soc *
@@ -895,8 +880,8 @@ static const struct iommu_ops tegra_smmu_ops = {
.domain_free = tegra_smmu_domain_free,
.attach_dev = tegra_smmu_attach_dev,
.detach_dev = tegra_smmu_detach_dev,
-   .add_device = tegra_smmu_add_device,
-   .remove_device = tegra_smmu_remove_device,
+   .probe_device = tegra_smmu_probe_device,
+   .release_device = tegra_smmu_release_device,
.device_group = tegra_smmu_device_group,
.map = tegra_smmu_map,
.unmap = tegra_smmu_unmap,
@@ -1015,7 +1000,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
 * value. However the IOMMU registration process will attempt to add
 * all devices to the IOMMU when bus_set_iommu() is called. In order
 * not to rely on global variables to track the IOMMU instance, we
-* set it here so that it can be looked up from the .add_device()
+* set it here so that it can be looked up from the 

[RFC PATCH 25/34] iommu/qcom: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the QCOM IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/qcom_iommu.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 0e2a96467767..054e476ebd49 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -524,14 +524,13 @@ static bool qcom_iommu_capable(enum iommu_cap cap)
}
 }
 
-static int qcom_iommu_add_device(struct device *dev)
+static struct iommu_device *qcom_iommu_probe_device(struct device *dev)
 {
struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
-   struct iommu_group *group;
struct device_link *link;
 
if (!qcom_iommu)
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
/*
 * Establish the link between iommu and master, so that the
@@ -542,28 +541,19 @@ static int qcom_iommu_add_device(struct device *dev)
if (!link) {
dev_err(qcom_iommu->dev, "Unable to create device link between 
%s and %s\n",
dev_name(qcom_iommu->dev), dev_name(dev));
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
}
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   iommu_device_link(_iommu->iommu, dev);
-
-   return 0;
+   return _iommu->iommu;
 }
 
-static void qcom_iommu_remove_device(struct device *dev)
+static void qcom_iommu_release_device(struct device *dev)
 {
struct qcom_iommu_dev *qcom_iommu = to_iommu(dev);
 
if (!qcom_iommu)
return;
 
-   iommu_device_unlink(_iommu->iommu, dev);
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -619,8 +609,8 @@ static const struct iommu_ops qcom_iommu_ops = {
.flush_iotlb_all = qcom_iommu_flush_iotlb_all,
.iotlb_sync = qcom_iommu_iotlb_sync,
.iova_to_phys   = qcom_iommu_iova_to_phys,
-   .add_device = qcom_iommu_add_device,
-   .remove_device  = qcom_iommu_remove_device,
+   .probe_device   = qcom_iommu_probe_device,
+   .release_device = qcom_iommu_release_device,
.device_group   = generic_device_group,
.of_xlate   = qcom_iommu_of_xlate,
.pgsize_bitmap  = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 23/34] iommu/mediatek: Convert to probe/release_device() call-backs

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Convert the Mediatek IOMMU driver to use the probe_device() and
release_device() call-backs of iommu_ops, so that the iommu core code
does the group and sysfs setup.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/mtk_iommu.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 5f4d6df59cf6..2be96f1cdbd2 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -441,38 +441,26 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct 
iommu_domain *domain,
return pa;
 }
 
-static int mtk_iommu_add_device(struct device *dev)
+static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
-   struct iommu_group *group;
 
if (!fwspec || fwspec->ops != _iommu_ops)
-   return -ENODEV; /* Not a iommu client device */
+   return ERR_PTR(-ENODEV); /* Not a iommu client device */
 
data = dev_iommu_priv_get(dev);
-   iommu_device_link(>iommu, dev);
 
-   group = iommu_group_get_for_dev(dev);
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   return 0;
+   return >iommu;
 }
 
-static void mtk_iommu_remove_device(struct device *dev)
+static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct mtk_iommu_data *data;
 
if (!fwspec || fwspec->ops != _iommu_ops)
return;
 
-   data = dev_iommu_priv_get(dev);
-   iommu_device_unlink(>iommu, dev);
-
-   iommu_group_remove_device(dev);
iommu_fwspec_free(dev);
 }
 
@@ -526,8 +514,8 @@ static const struct iommu_ops mtk_iommu_ops = {
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
.iotlb_sync = mtk_iommu_iotlb_sync,
.iova_to_phys   = mtk_iommu_iova_to_phys,
-   .add_device = mtk_iommu_add_device,
-   .remove_device  = mtk_iommu_remove_device,
+   .probe_device   = mtk_iommu_probe_device,
+   .release_device = mtk_iommu_release_device,
.device_group   = mtk_iommu_device_group,
.of_xlate   = mtk_iommu_of_xlate,
.pgsize_bitmap  = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 09/34] iommu: Keep a list of allocated groups in __iommu_probe_device()

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

This is needed to defer default_domain allocation for new IOMMU groups
until all devices have been added to the group.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7a385c18e1a5..18eb3623bd00 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -44,6 +44,7 @@ struct iommu_group {
int id;
struct iommu_domain *default_domain;
struct iommu_domain *domain;
+   struct list_head entry;
 };
 
 struct group_device {
@@ -184,7 +185,7 @@ static void dev_iommu_free(struct device *dev)
dev->iommu = NULL;
 }
 
-static int __iommu_probe_device(struct device *dev)
+static int __iommu_probe_device(struct device *dev, struct list_head 
*group_list)
 {
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_device *iommu_dev;
@@ -204,6 +205,9 @@ static int __iommu_probe_device(struct device *dev)
}
iommu_group_put(group);
 
+   if (group_list && !group->default_domain && list_empty(>entry))
+   list_add_tail(>entry, group_list);
+
iommu_device_link(iommu_dev, dev);
 
return 0;
@@ -234,7 +238,7 @@ int iommu_probe_device(struct device *dev)
if (ops->probe_device) {
struct iommu_group *group;
 
-   ret = __iommu_probe_device(dev);
+   ret = __iommu_probe_device(dev, NULL);
 
/*
 * Try to allocate a default domain - needs support from the
@@ -567,6 +571,7 @@ struct iommu_group *iommu_group_alloc(void)
group->kobj.kset = iommu_group_kset;
mutex_init(>mutex);
INIT_LIST_HEAD(>devices);
+   INIT_LIST_HEAD(>entry);
BLOCKING_INIT_NOTIFIER_HEAD(>notifier);
 
ret = ida_simple_get(_group_ida, 0, 0, GFP_KERNEL);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 03/34] iommu/amd: Implement iommu_ops->def_domain_type call-back

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Implement the new def_domain_type call-back for the AMD IOMMU driver.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 20cce366e951..73b4f84cf449 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2661,6 +2661,20 @@ static void amd_iommu_iotlb_sync(struct iommu_domain 
*domain,
amd_iommu_flush_iotlb_all(domain);
 }
 
+static int amd_iommu_def_domain_type(struct device *dev)
+{
+   struct iommu_dev_data *dev_data;
+
+   dev_data = get_dev_data(dev);
+   if (!dev_data)
+   return 0;
+
+   if (dev_data->iommu_v2)
+   return IOMMU_DOMAIN_IDENTITY;
+
+   return 0;
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -2680,6 +2694,7 @@ const struct iommu_ops amd_iommu_ops = {
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
.flush_iotlb_all = amd_iommu_flush_iotlb_all,
.iotlb_sync = amd_iommu_iotlb_sync,
+   .def_domain_type = amd_iommu_def_domain_type,
 };
 
 /*
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 02/34] iommu: Add def_domain_type() callback in iommu_ops

2020-04-07 Thread Joerg Roedel
From: Sai Praneeth Prakhya 

Some devices are reqired to use a specific type (identity or dma)
of default domain when they are used with a vendor iommu. When the
system level default domain type is different from it, the vendor
iommu driver has to request a new default domain with
iommu_request_dma_domain_for_dev() and iommu_request_dm_for_dev()
in the add_dev() callback. Unfortunately, these two helpers only
work when the group hasn't been assigned to any other devices,
hence, some vendor iommu driver has to use a private domain if
it fails to request a new default one.

This adds def_domain_type() callback in the iommu_ops, so that
any special requirement of default domain for a device could be
aware by the iommu generic layer.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Lu Baolu 
[ jroe...@suse.de: Added iommu_get_def_domain_type() function and use
   it to allocate the default domain ]
Co-developed-by: Joerg Roedel 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 20 +---
 include/linux/iommu.h |  6 ++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bfe011760ed1..5877abd9b693 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,21 +1361,35 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
+static int iommu_get_def_domain_type(struct device *dev)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+   unsigned int type = 0;
+
+   if (ops->def_domain_type)
+   type = ops->def_domain_type(dev);
+
+   return (type == 0) ? iommu_def_domain_type : type;
+}
+
 static int iommu_alloc_default_domain(struct device *dev,
  struct iommu_group *group)
 {
struct iommu_domain *dom;
+   unsigned int type;
 
if (group->default_domain)
return 0;
 
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+   type = iommu_get_def_domain_type(dev);
+
+   dom = __iommu_domain_alloc(dev->bus, type);
+   if (!dom && type != IOMMU_DOMAIN_DMA) {
dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
if (dom) {
dev_warn(dev,
 "failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
+type);
}
}
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ef8b0bda695..1f027b07e499 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -248,6 +248,10 @@ struct iommu_iotlb_gather {
  * @cache_invalidate: invalidate translation caches
  * @sva_bind_gpasid: bind guest pasid and mm
  * @sva_unbind_gpasid: unbind guest pasid and mm
+ * @def_domain_type: device default domain type, return value:
+ * - IOMMU_DOMAIN_IDENTITY: must use an identity domain
+ * - IOMMU_DOMAIN_DMA: must use a dma domain
+ * - 0: use the default setting
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
  */
@@ -318,6 +322,8 @@ struct iommu_ops {
 
int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
+   int (*def_domain_type)(struct device *dev);
+
unsigned long pgsize_bitmap;
struct module *owner;
 };
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 00/34] iommu: Move iommu_group setup to IOMMU core code

2020-04-07 Thread Joerg Roedel
Hi,

here is a patch-set to remove all calls of iommu_group_get_for_dev() from
the IOMMU drivers and move the per-device group setup and default domain
allocation into the IOMMU core code.

This eliminates some ugly back and forth between IOMMU core code and the
IOMMU drivers, where the driver called iommu_group_get_for_dev() which itself
called back into the driver.

The patch-set started as a "quick" Friday afternoon project to split the
IOMMU group creation and the allocation of the default domain, so that the
default domain is not allocated before all devices are added to the group.
In the end it took 1.5 weeks to get this in a reasonable shape, but now the
code (during bus probing) first adds all devices to their respective IOMMU
group before it determines the default domain type and then allocates it for
the group.

It turned out that this required to remove the calls of
iommu_group_get_for_dev() from the IOMMU drivers. While at it, the calls to
iommu_device_link()/unlink() where also moved out of the drivers, which
required a different interface than add_device()/remove_device(). The result
is the new probe_device()/release_device() interface, where the driver just
does its own setup and then returns the iommu_device which belongs to the
device being probed.

There is certainly more room for cleanups, but I think this is a good start
to simplify the code flow during IOMMU device probing.  It is also a more
robust base for the pending patch-sets which implement per-group default
domain types and the removal of the private domains from the Intel VT-d
driver.

With regards to testing, I verified this code works on three IOMMUs:

- AMD-Vi
- Intel VT-d (but there might be breakages on some hardware, the
  patches to remove the private domain handling from the VT-d driver
  should be rebased to these patches)
- ARM-SMMU-v3 (as emulated by QEMU)

Most driver conversions to the probe_device()/release_device() interface
were trivial, but there were also some hard nuts, which I am not sure still
work. The more difficult drivers were:

- ARM-SMMU-v2
- OMAP
- Renesas
- Mediatek IOMMU v1
- Exynos

It would be great if the changes could be tested (and possibly fixed) on
those IOMMUs, as I can't do testing on them.

The patches are based on the current iommu/next branch, I will rebase them
to v5.7-rc1 when it comes out. A branch with these patches applied can be
found here:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device

Please review and test these changes and let me know what breaks.

Thanks,

Joerg

Joerg Roedel (33):
  iommu: Move default domain allocation to separate function
  iommu/amd: Implement iommu_ops->def_domain_type call-back
  iommu/vt-d: Wire up iommu_ops->def_domain_type
  iommu/amd: Remove dma_mask check from check_device()
  iommu/amd: Return -ENODEV in add_device when device is not handled by
IOMMU
  iommu: Add probe_device() and remove_device() call-backs
  iommu: Move default domain allocation to iommu_probe_device()
  iommu: Keep a list of allocated groups in __iommu_probe_device()
  iommu: Move new probe_device path to separate function
  iommu: Split off default domain allocation from group assignment
  iommu: Move iommu_group_create_direct_mappings() out of
iommu_group_add_device()
  iommu: Export bus_iommu_probe() and make is safe for re-probing
  iommu/amd: Remove dev_data->passthrough
  iommu/amd: Convert to probe/release_device() call-backs
  iommu/vt-d: Convert to probe/release_device() call-backs
  iommu/arm-smmu: Store device instead of group in arm_smmu_s2cr
  iommu/arm-smmu: Convert to probe/release_device() call-backs
  iommu/pamu: Convert to probe/release_device() call-backs
  iommu/s390: Convert to probe/release_device() call-backs
  iommu/virtio: Convert to probe/release_device() call-backs
  iommu/msm: Convert to probe/release_device() call-backs
  iommu/mediatek: Convert to probe/release_device() call-backs
  iommu/mediatek-v1 Convert to probe/release_device() call-backs
  iommu/qcom: Convert to probe/release_device() call-backs
  iommu/rockchip: Convert to probe/release_device() call-backs
  iommu/tegra: Convert to probe/release_device() call-backs
  iommu/renesas: Convert to probe/release_device() call-backs
  iommu/omap: Remove orphan_dev tracking
  iommu/omap: Convert to probe/release_device() call-backs
  iommu/exynos: Create iommu_device in struct exynos_iommu_owner
  iommu/exynos: Convert to probe/release_device() call-backs
  iommu: Remove add_device()/remove_device() code-paths
  iommu: Unexport iommu_group_get_for_dev()

Sai Praneeth Prakhya (1):
  iommu: Add def_domain_type() callback in iommu_ops

 drivers/iommu/amd_iommu.c   |  97 
 drivers/iommu/amd_iommu_types.h |   1 -
 drivers/iommu/arm-smmu-v3.c |  42 +---
 drivers/iommu/arm-smmu.c|  44 ++--
 drivers/iommu/exynos-iommu.c| 113 

[RFC PATCH 01/34] iommu: Move default domain allocation to separate function

2020-04-07 Thread Joerg Roedel
From: Joerg Roedel 

Move the code out of iommu_group_get_for_dev() into a separate
function.

Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 74 ++-
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2b471419e26c..bfe011760ed1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1361,6 +1361,41 @@ struct iommu_group *fsl_mc_device_group(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(fsl_mc_device_group);
 
+static int iommu_alloc_default_domain(struct device *dev,
+ struct iommu_group *group)
+{
+   struct iommu_domain *dom;
+
+   if (group->default_domain)
+   return 0;
+
+   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
+   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   if (dom) {
+   dev_warn(dev,
+"failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
+iommu_def_domain_type);
+   }
+   }
+
+   if (!dom)
+   return -ENOMEM;
+
+   group->default_domain = dom;
+   if (!group->domain)
+   group->domain = dom;
+
+   if (!iommu_dma_strict) {
+   int attr = 1;
+   iommu_domain_set_attr(dom,
+ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+ );
+   }
+
+   return 0;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1393,40 +1428,21 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
 
/*
 * Try to allocate a default domain - needs support from the
-* IOMMU driver.
+* IOMMU driver. There are still some drivers which don't support
+* default domains, so the return value is not yet checked.
 */
-   if (!group->default_domain) {
-   struct iommu_domain *dom;
-
-   dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-   if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
-   if (dom) {
-   dev_warn(dev,
-"failed to allocate default IOMMU 
domain of type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
-   }
-   }
-
-   group->default_domain = dom;
-   if (!group->domain)
-   group->domain = dom;
-
-   if (dom && !iommu_dma_strict) {
-   int attr = 1;
-   iommu_domain_set_attr(dom,
- DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
- );
-   }
-   }
+   iommu_alloc_default_domain(dev, group);
 
ret = iommu_group_add_device(group, dev);
-   if (ret) {
-   iommu_group_put(group);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto out_put_group;
 
return group;
+
+out_put_group:
+   iommu_group_put(group);
+
+   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL(iommu_group_get_for_dev);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Alex Williamson
On Tue, 7 Apr 2020 04:26:23 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Saturday, April 4, 2020 1:26 AM  
> [...]
> > > > > + if (!pasid_cap.control_reg.paside) {
> > > > > + pr_debug("%s: its PF's PASID capability is not 
> > > > > enabled\n",
> > > > > + dev_name(>pdev->dev));
> > > > > + ret = 0;
> > > > > + goto out;
> > > > > + }  
> > > >
> > > > What happens if the PF's PASID gets disabled while we're using it??  
> > >
> > > This is actually the open I highlighted in cover letter. Per the reply
> > > from Baolu, this seems to be an open for bare-metal all the same.
> > > https://lkml.org/lkml/2020/3/31/95  
> > 
> > Seems that needs to get sorted out before we can expose this.  Maybe
> > some sort of registration with the PF driver that PASID is being used
> > by a VF so it cannot be disabled?  
> 
> I guess we may do vSVA for PF first, and then adding VF vSVA later
> given above additional need. It's not necessarily to enable both
> in one step.
> 
> [...]
> > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct  
> > vfio_pci_device *vdev)  
> > > > >   if (!ecaps)
> > > > >   *(u32 *)>vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > >
> > > > > +#ifdef CONFIG_PCI_ATS
> > > > > + if (pdev->is_virtfn) {
> > > > > + struct pci_dev *physfn = pdev->physfn;
> > > > > +
> > > > > + ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > + physfn, epos_max, prev);
> > > > > + if (ret)
> > > > > + pr_info("%s, failed to add special caps for VF 
> > > > > %s\n",
> > > > > + __func__, dev_name(>pdev->dev));
> > > > > + }
> > > > > +#endif  
> > > >
> > > > I can only imagine that we should place the caps at the same location
> > > > they exist on the PF, we don't know what hidden registers might be
> > > > hiding in config space.  
> 
> Is there vendor guarantee that hidden registers will locate at the
> same offset between PF and VF config space? 

I'm not sure if the spec really precludes hidden registers, but the
fact that these registers are explicitly outside of the capability
chain implies they're only intended for device specific use, so I'd say
there are no guarantees about anything related to these registers.

FWIW, vfio started out being more strict about restricting config space
access to defined capabilities, until...

commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
Author: Alex Williamson 
Date:   Mon Apr 1 09:04:12 2013 -0600

vfio-pci: Enable raw access to unassigned config space

Devices like be2net hide registers between the gaps in capabilities
and architected regions of PCI config space.  Our choices to support
such devices is to either build an ever growing and unmanageable white
list or rely on hardware isolation to protect us.  These registers are
really no different than MMIO or I/O port space registers, which we
don't attempt to regulate, so treat PCI config space in the same way.

> > > but we are not sure whether the same location is available on VF. In
> > > this patch, it actually places the emulated cap physically behind the
> > > cap which lays farthest (its offset is largest) within VF's config space
> > > as the PCIe caps are linked in a chain.  
> > 
> > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > nasty habit of hiding random registers in PCI config space, outside of
> > defined capabilities.  I feel like IGD might even do this too, is that
> > true?  So I don't think we can guarantee that just because a section of
> > config space isn't part of a defined capability that its unused.  It
> > only means that it's unused by common code, but it might have device
> > specific purposes.  So of the PCIe spec indicates that VFs cannot
> > include these capabilities and virtialization software needs to
> > emulate them, we need somewhere safe to place them in config space, and
> > simply placing them off the end of known capabilities doesn't give me
> > any confidence.  Also, hardware has no requirement to make compact use
> > of extended config space.  The first capability must be at 0x100, the
> > very next capability could consume all the way to the last byte of the
> > 4K extended range, and the next link in the chain could be somewhere in
> > the middle.  Thanks,
> >   
> 
> Then what would be a viable option? Vendor nasty habit implies
> no standard, thus I don't see how VFIO can find a safe location
> by itself. Also curious how those hidden registers are identified
> by VFIO and employed with proper r/w policy today. If sort of quirks
> are used, then could such quirk way be extended to also carry
> the information about vendor specific safe location? When no
> such quirk info is provided (the majority case), VFIO then finds
> out a free location to carry the new cap.

See above 

Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

2020-04-07 Thread Qian Cai


> On Apr 6, 2020, at 10:12 PM, Qian Cai  wrote:
> 
> fetch_pte() could race with increase_address_space() because it held no
> lock from iommu_unmap_page(). On the CPU that runs fetch_pte() it could
> see a stale domain->pt_root and a new increased domain->mode from
> increase_address_space(). As the result, it could trigger invalid
> accesses later on. Fix it by using a pair of smp_[w|r]mb in those
> places.

After further testing, the change along is insufficient. What I am chasing right
now is the swap device will go offline after heavy memory pressure below. The
symptom is similar to what we have in the commit,

754265bcab78 (“iommu/amd: Fix race in increase_address_space()”)

Apparently, it is no possible to take the domain->lock in fetch_pte() because it
could sleep.

Thoughts?

[ 7292.727377][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd8 flags=0x0010]
[ 7292.740571][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81000 flags=0x0010]
[ 7292.753268][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81900 flags=0x0010]
[ 7292.766078][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd81d00 flags=0x0010]
[ 7292.778447][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82000 flags=0x0010]
[ 7292.790724][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82400 flags=0x0010]
[ 7292.803195][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82800 flags=0x0010]
[ 7292.815664][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd82c00 flags=0x0010]
[ 7292.828089][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd83000 flags=0x0010]
[ 7292.840468][  T814] smartpqi :23:00.0: AMD-Vi: Event logged 
[IO_PAGE_FAULT domain=0x0027 address=0xffd83400 flags=0x0010]
[ 7292.852795][  T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xffd83800 flags=0x0010]
[ 7292.864566][  T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 
domain=0x0027 address=0xffd83c00 flags=0x0010]
[ 7326.583908][C8] smartpqi :23:00.0: controller is offline: status 
code 0x14803
[ 7326.592386][C8] smartpqi :23:00.0: controller offline
[ 7326.663728][   C66] sd 0:1:0:0: [sda] tag#467 UNKNOWN(0x2003) Result: 
hostbyte=0x01 driverbyte=0x00 cmd_age=6s
[ 7326.664321][ T2738] smartpqi :23:00.0: resetting scsi 0:1:0:0
[ 7326.673849][   C66] sd 0:1:0:0: [sda] tag#467 CDB: opcode=0x28 28 00 02 ee 
2e 60 00 00 08 00
[ 7326.680118][ T2738] smartpqi :23:00.0: reset of scsi 0:1:0:0: FAILED
[ 7326.688612][   C66] blk_update_request: I/O error, dev sda, sector 49163872 
op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7326.695456][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.706632][   C66] Read-error on swap-device (254:1:45833824)
[ 7326.714030][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.723382][T45523] sd 0:1:0:0: rejecting I/O to offline device
[ 7326.727352][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.727379][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery
[ 7326.727442][ T2738] sd 0:1:0:0: Device offlined - not ready after error 
recovery

> 
> kernel BUG at drivers/iommu/amd_iommu.c:1704!
> BUG_ON(unmapped && !is_power_of_2(unmapped));
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 
> 07/10/2019
> RIP: 0010:amd_iommu_unmap+0x1b2/0x1d0
> Call Trace:
>  
>  __iommu_unmap+0x106/0x320
>  iommu_unmap_fast+0xe/0x10
>  __iommu_dma_unmap+0xdc/0x1a0
>  iommu_dma_unmap_sg+0xae/0xd0
>  scsi_dma_unmap+0xe7/0x150
>  pqi_raid_io_complete+0x37/0x60 [smartpqi]
>  pqi_irq_handler+0x1fc/0x13f0 [smartpqi]
>  __handle_irq_event_percpu+0x78/0x4f0
>  handle_irq_event_percpu+0x70/0x100
>  handle_irq_event+0x5a/0x8b
>  handle_edge_irq+0x10c/0x370
>  do_IRQ+0x9e/0x1e0
>  common_interrupt+0xf/0xf
>  
> 
> Signed-off-by: Qian Cai 
> ---
> drivers/iommu/amd_iommu.c | 9 +
> 1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 20cce366e951..22328a23335f 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1434,6 +1434,11 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>   *pte = PM_LEVEL_PDE(domain->mode,
>   iommu_virt_to_phys(domain->pt_root));
>   domain->pt_root  = pte;
> + /*
> +  * Make sure fetch_pte() will see the new domain->pt_root before it
> +  * snapshots domain->mode.
> +  

[git pull] IOMMU Updates for Linux v5.7

2020-04-07 Thread Joerg Roedel
Hi Linus,

The following changes since commit 16fbf79b0f83bc752cee8589279f1ebfe57b3b6e:

  Linux 5.6-rc7 (2020-03-22 18:31:56 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-updates-v5.7

for you to fetch changes up to ff68eb23308e6538ec7864c83d39540f423bbe90:

  Merge branches 'iommu/fixes', 'arm/qcom', 'arm/omap', 'arm/smmu', 'x86/amd', 
'x86/vt-d', 'virtio' and 'core' into next (2020-03-27 11:33:27 +0100)


IOMMU Updates for Linux v5.7

Including:

- ARM-SMMU support for the TLB range invalidation command in
  SMMUv3.2.

- ARM-SMMU introduction of command batching helpers to batch up
  CD and ATC invalidation.

- ARM-SMMU support for PCI PASID, along with necessary PCI
  symbol exports.

- Introduce a generic (actually rename an existing) IOMMU
  related pointer in struct device and reduce the IOMMU related
  pointers.

- Some fixes for the OMAP IOMMU driver to make it build on 64bit
  architectures.

- Various smaller fixes and improvements.


Adrian Huang (1):
  iommu/amd: Fix the configuration of GCR3 table root pointer

Gustavo A. R. Silva (1):
  iommu/qcom: Replace zero-length array with flexible-array member

Jacob Pan (3):
  iommu/vt-d: Fix page request descriptor size
  iommu/vt-d: Fix mm reference leak
  iommu/vt-d: Add build dependency on IOASID

Jean-Philippe Brucker (9):
  iommu/virtio: Build virtio-iommu as module
  PCI/ATS: Export symbols of PASID functions
  iommu/arm-smmu-v3: Add support for PCI PASID
  iommu/arm-smmu-v3: Write level-1 descriptors atomically
  iommu/arm-smmu-v3: Add command queue batching helpers
  iommu/arm-smmu-v3: Batch context descriptor invalidation
  iommu/virtio: Fix sparse warning
  iommu/virtio: Fix freeing of incomplete domains
  iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

Joerg Roedel (17):
  Merge tag 'arm-smmu-updates' of git://git.kernel.org/.../will/linux into 
arm/smmu
  iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
  ACPI/IORT: Remove direct access of dev->iommu_fwspec
  drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
  iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
  iommu: Rename struct iommu_param to dev_iommu
  iommu: Move iommu_fwspec to struct dev_iommu
  iommu/arm-smmu: Fix uninitilized variable warning
  iommu: Introduce accessors for iommu private data
  iommu/arm-smmu-v3: Use accessor functions for iommu private data
  iommu/arm-smmu: Use accessor functions for iommu private data
  iommu/renesas: Use accessor functions for iommu private data
  iommu/mediatek: Use accessor functions for iommu private data
  iommu/qcom: Use accessor functions for iommu private data
  iommu/virtio: Use accessor functions for iommu private data
  iommu: Move fwspec->iommu_priv to struct dev_iommu
  Merge branches 'iommu/fixes', 'arm/qcom', 'arm/omap', 'arm/smmu', 
'x86/amd', 'x86/vt-d', 'virtio' and 'core' into next

Krzysztof Kozlowski (4):
  iommu/omap: Fix pointer cast -Wpointer-to-int-cast warnings on 64 bit
  iommu/omap: Fix printing format for size_t on 64-bit
  iommu/omap: Fix -Woverflow warnings when compiling on 64-bit architectures
  iommu: Enable compile testing for some of drivers

Qian Cai (1):
  iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()

Rob Herring (2):
  iommu/arm-smmu-v3: Batch ATC invalidation commands
  iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

Robin Murphy (3):
  iommu: Use C99 flexible array in fwspec
  MAINTAINERS: Cover Arm SMMU DT bindings
  iommu/arm-smmu: Refactor master_cfg/fwspec usage

 MAINTAINERS  |   1 +
 drivers/acpi/arm64/iort.c|   6 +-
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |   2 +-
 drivers/iommu/Kconfig|  21 +--
 drivers/iommu/amd_iommu_types.h  |   2 +-
 drivers/iommu/arm-smmu-v3.c  | 214 ++-
 drivers/iommu/arm-smmu.c |  55 
 drivers/iommu/intel-iommu.c  |   3 +-
 drivers/iommu/intel-svm.c|   9 +-
 drivers/iommu/iommu.c|  46 ---
 drivers/iommu/ipmmu-vmsa.c   |   7 +-
 drivers/iommu/mtk_iommu.c|  13 +-
 drivers/iommu/mtk_iommu_v1.c |  14 +-
 drivers/iommu/omap-iommu.c   |  10 +-
 drivers/iommu/omap-iopgtable.h   |   3 +-
 drivers/iommu/qcom_iommu.c   |  63 +
 drivers/iommu/tegra-gart.c   |   2 +-
 drivers/iommu/virtio-iommu.c |  42 +++---
 drivers/pci/ats.c|   4 +
 

Re: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)

2020-04-07 Thread Alex Williamson
On Tue, 7 Apr 2020 04:42:02 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson
> > Sent: Friday, April 3, 2020 11:14 PM
> > 
> > On Fri, 3 Apr 2020 05:58:55 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson 
> > > > Sent: Friday, April 3, 2020 1:50 AM
> > > >
> > > > On Sun, 22 Mar 2020 05:31:58 -0700
> > > > "Liu, Yi L"  wrote:
> > > >  
> > > > > From: Liu Yi L 
> > > > >
> > > > > For a long time, devices have only one DMA address space from  
> > platform  
> > > > > IOMMU's point of view. This is true for both bare metal and directed-
> > > > > access in virtualization environment. Reason is the source ID of DMA 
> > > > > in
> > > > > PCIe are BDF (bus/dev/fnc ID), which results in only device 
> > > > > granularity
> > > > > DMA isolation. However, this is changing with the latest advancement 
> > > > > in
> > > > > I/O technology area. More and more platform vendors are utilizing the 
> > > > >  
> > > > PCIe  
> > > > > PASID TLP prefix in DMA requests, thus to give devices with multiple  
> > DMA  
> > > > > address spaces as identified by their individual PASIDs. For example,
> > > > > Shared Virtual Addressing (SVA, a.k.a Shared Virtual Memory) is able 
> > > > > to
> > > > > let device access multiple process virtual address space by binding 
> > > > > the
> > > > > virtual address space with a PASID. Wherein the PASID is allocated in
> > > > > software and programmed to device per device specific manner.  
> > Devices  
> > > > > which support PASID capability are called PASID-capable devices. If 
> > > > > such
> > > > > devices are passed through to VMs, guest software are also able to 
> > > > > bind
> > > > > guest process virtual address space on such devices. Therefore, the  
> > guest  
> > > > > software could reuse the bare metal software programming model,  
> > which  
> > > > > means guest software will also allocate PASID and program it to device
> > > > > directly. This is a dangerous situation since it has potential PASID
> > > > > conflicts and unauthorized address space access. It would be safer to
> > > > > let host intercept in the guest software's PASID allocation. Thus 
> > > > > PASID
> > > > > are managed system-wide.  
> > > >
> > > > Providing an allocation interface only allows for collaborative usage
> > > > of PASIDs though.  Do we have any ability to enforce PASID usage or can
> > > > a user spoof other PASIDs on the same BDF?  
> > >
> > > An user can access only PASIDs allocated to itself, i.e. the specific 
> > > IOASID
> > > set tied to its mm_struct.  
> > 
> > A user is only _supposed_ to access PASIDs allocated to itself.  AIUI
> > the mm_struct is used for managing the pool of IOASIDs from which the
> > user may allocate that PASID.  We also state that programming the PASID
> > into the device is device specific.  Therefore, are we simply trusting
> > the user to use a PASID that's been allocated to them when they program
> > the device?  If a user can program an arbitrary PASID into the device,
> > then what prevents them from attempting to access data from another
> > user via the device?   I think I've asked this question before, so if
> > there's a previous explanation or spec section I need to review, please
> > point me to it.  Thanks,
> >   
> 
> There are two scenarios:
> 
> (1) for PF/VF, the iommu driver maintains an individual PASID table per
> PDF. Although the PASID namespace is global, the per-BDF PASID table
> contains only valid entries for those PASIDs which are allocated to the
> mm_struct. The user is free to program arbitrary PASID into the assigned
> device, but using invalid PASIDs simply hit iommu fault.
> 
> (2) for mdev, multiple mdev instances share the same PASID table of
> the parent BDF. However, PASID programming is a privileged operation
> in multiplexing usage, thus must be mediated by mdev device driver. 
> The mediation logic will guarantee that only allocated PASIDs are 
> forwarded to the device. 

Thanks, I was confused about multiple tenants sharing a BDF when PASID
programming to the device is device specific, and therefore not
something we can virtualize.  However, the solution is device specific
virtualization via mdev.  Thus, any time we're sharing a BDF between
tenants, we must virtualize the PASID programming and therefore it must
be an mdev device currently.  If a tenant is the exclusive user of the
BDF, then no virtualization of the PASID programming is required.  I
think it's clear now (again).  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use

2020-04-07 Thread Jean-Philippe Brucker
On Mon, Apr 06, 2020 at 08:33:53AM -0700, Jacob Pan wrote:
> Hi Jean,
> 
> On Wed, 1 Apr 2020 15:53:16 +0200
> Jean-Philippe Brucker  wrote:
> 
> > On Wed, Mar 25, 2020 at 10:55:26AM -0700, Jacob Pan wrote:
> > > Bare metal SVA allocates IOASIDs for native process addresses. This
> > > should be separated from VM allocated IOASIDs thus under its own
> > > set.
> > > 
> > > This patch creates a system IOASID set with its quota set to
> > > PID_MAX. This is a reasonable default in that SVM capable devices
> > > can only bind to limited user processes.  
> > 
> > Yes realistically there won't be more than PID_MAX_DEFAULT=0x8000
> > bound address spaces. My machine uses a PID_MAX of 4 million though,
> > so in theory more than 0x8000 processes may want a bond.
> Got it, I assume we can adjust the system set quota as necessary.
> 
> > On Arm the
> > limit of shared contexts per VM is currently a little less than
> > 0x1 (which is the number of CPU ASIDs).
> > 
> I guess shared contexts means shared address? then it makes sense
> #IOASID < #ASID.

Yes by shared contexts I mean shared address spaces. Theoretically #ASID <
#IOASID for us, because the max ASID size is 16-bit.

> 
> > But quotas are only necessary for VMs, when the host shares the PASID
> > space with them (which isn't a use-case for Arm systems as far as I
> > know, each VM gets its own PASID space).
> Is there a host-guest PASID translation? or the PASID used by the VM is
> physical PASID? When a page request comes in to SMMU, how does it know
> the owner of the PASID if PASID range can overlap between host and
> guest?

We assign PCI functions to VMs, so Page Requests are routed with
RID:PASID, not PASID alone. The SMMU finds the struct device associated
with the RID, and submits the fault with iommu_report_device_fault(). If
the VF is assigned to a VM, then the page request gets injected into the
VM, otherwise it uses the host IOPF handler

> > Could we have quota-free IOASID sets for the host?
> > 
> Yes, perhaps just add a flag such that the set has its own namespace.
> You mean have this quota-free IOASID set even co-exist with VMs? I still
> don't get how PRQ works.
> 
> That is not the use case for VT-d in that we have to have system-wide
> allocation for host PASIDs. We have enqcmd which can take a PASID from
> the per task MSR and deliver to multiple devices, so even though the
> PASID table is per device the PASID name space must be global.
> 
> > For the SMMU I'd like to allocate two sets, one SVA and one private
> > for auxiliary domains, and I don't think giving either a quota makes
> > much sense at the moment.
> I agree we don;t need the quota if we don't support guest SVA at the
> same time.
> 
> So the sva set and aux_domain set PASIDs have their own namespaces?

They share the same PASID space, but they store different objects
(mm_struct and context descriptor, respectively) so they need different
ioasid_set tokens.

> 
> > There can be systems using only SVA and
> > systems using only private PASIDs. I think it should be
> > first-come-first-served until admins want a knob to define a policy
> > themselves, based on cgroups for example.
> > 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  drivers/iommu/intel-iommu.c | 8 +++-
> > >  drivers/iommu/ioasid.c  | 9 +
> > >  include/linux/ioasid.h  | 9 +
> > >  3 files changed, 25 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index ec3fc121744a..af7a1ef7b31e
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -3511,8 +3511,14 @@ static int __init init_dmars(void)
> > >   goto free_iommu;
> > >  
> > >   /* PASID is needed for scalable mode irrespective to SVM */
> > > - if (intel_iommu_sm)
> > > + if (intel_iommu_sm) {
> > >   ioasid_install_capacity(intel_pasid_max_id);
> > > + /* We should not run out of IOASIDs at boot */
> > > + if (ioasid_alloc_system_set(PID_MAX_DEFAULT)) {
> > > + pr_err("Failed to enable host PASID
> > > allocator\n");
> > > + intel_iommu_sm = 0;
> > > + }
> > > + }
> > >  
> > >   /*
> > >* for each drhd
> > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > > index 6265d2dbbced..9135af171a7c 100644
> > > --- a/drivers/iommu/ioasid.c
> > > +++ b/drivers/iommu/ioasid.c
> > > @@ -39,6 +39,9 @@ struct ioasid_data {
> > >  static ioasid_t ioasid_capacity;
> > >  static ioasid_t ioasid_capacity_avail;
> > >  
> > > +int system_ioasid_sid;
> > > +static DECLARE_IOASID_SET(system_ioasid);
> > > +
> > >  /* System capacity can only be set once */
> > >  void ioasid_install_capacity(ioasid_t total)
> > >  {
> > > @@ -51,6 +54,12 @@ void ioasid_install_capacity(ioasid_t total)
> > >  }
> > >  EXPORT_SYMBOL_GPL(ioasid_install_capacity);
> > >  
> > > +int ioasid_alloc_system_set(int quota)
> > > +{
> > > + return 

Re: [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs

2020-04-07 Thread Jean-Philippe Brucker
On Mon, Apr 06, 2020 at 01:02:45PM -0700, Jacob Pan wrote:
> > > + sdata = kzalloc(sizeof(*sdata), GFP_KERNEL);
> > > + if (!sdata)
> > > + return -ENOMEM;  
> > 
> > I don't understand why we need this structure at all, nor why we need
> > the SID. Users have already allocated an ioasid_set, so why not just
> > stick the content of ioasid_set_data in there, and pass the
> > ioasid_set pointer to ioasid_alloc()?
> > 
> 
> My thinking was that ioasid_set is an opaque user token, e.g. we use mm
> to identify a common set belong to a VM.
> 
> This sdata is an IOASID internal structure for managing & servicing per
> set data. If we let user fill in the content, some of the entries need
> to be managed by the IOASID code under a lock.

We don't have to let users fill the content. A bit like iommu_domain:
device drivers don't modify it, they pass it to iommu_map() rather than
passing a domain ID.

> IMO, not suitable to let user allocate and manage.
> 
> Perhaps we should rename struct ioasid_set to ioasid_set_token?

Is the token actually used anywhere?  As far as I can tell VFIO does its
own uniqueness check before calling ioasid_alloc_set(), and consumers of
notifications don't read the token.

> 
> /**
>  * struct ioasid_set_data - Meta data about ioasid_set
>  *
>  * @token:Unique to identify an IOASID set
>  * @xa:   XArray to store ioasid_set private ID to
> system-wide IOASID
>  *mapping
>  * @max_id:   Max number of IOASIDs can be allocated within the set
>  * @nr_id Number of IOASIDs allocated in the set
>  * @sid   ID of the set
>  */
> struct ioasid_set_data {
>   struct ioasid_set *token;
>   struct xarray xa;
>   int size;
>   int nr_ioasids;
>   int sid;
>   struct rcu_head rcu;
> };

How about we remove the current ioasid_set, call this structure ioasid_set
instead of ioasid_set_data, and have ioasid_alloc_set() return it, rather
than requiring users to allocate the ioasid_set themselves?

struct ioasid_set *ioasid_alloc_set(ioasid_t quota):

This way ioasid_set is opaque to users (we could have the definition in
ioasid.c), but it can be passed to ioasid_alloc() and avoids the lookup by
SID. Could also add the unique token as a void * argument to
ioasid_alloc_set(), if needed.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host

2020-04-07 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker < jean-phili...@linaro.org >
> Sent: Friday, April 3, 2020 4:35 PM
> Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> 
> On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote:
> > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > > default:
> > > > > > return -EINVAL;
> > > > > > }
> > > > > > +
> > > > > > +   } else if (cmd == VFIO_IOMMU_BIND) {
> > > > >
> > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me.
> > > >
> > > > Emm, it's up to the flags to indicate bind what. It was proposed to
> > > > cover the three cases below:
> > > > a) BIND/UNBIND_GPASID
> > > > b) BIND/UNBIND_GPASID_TABLE
> > > > c) BIND/UNBIND_PROCESS
> > > > 
> > > > So it's called VFIO_IOMMU_BIND.
> > >
> > > but aren't they all about PASID related binding?
> >
> > yeah, I can rename it. :-)
> 
> I don't know if anyone intends to implement it, but SMMUv2 supports
> nesting translation without any PASID support. For that case the name
> VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense.
> Ideally we'd also use a neutral name for the IOMMU API instead of
> bind_gpasid(), but that's easier to change later.

I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may
cause confusion when thinking about VFIO_SET_IOMMU. How about using
VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another
VFIO_BIND_PROCESS in future for the SVA bind case.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-04-07 Thread Alexey Kardashevskiy



On 07/04/2020 03:17, Christoph Hellwig wrote:
> On Mon, Apr 06, 2020 at 11:25:09PM +1000, Alexey Kardashevskiy wrote:
 Do you see any serious problem with this approach? Thanks!
>>>
>>> Do you have a link to the whole branch?  The github UI is unfortunately
>>> unusable for that (or I'm missing something).
>>
>> The UI shows the branch but since I rebased and forcepushed it, it does
>> not. Here is the current one with:
>>
>> https://github.com/aik/linux/commits/dma-bypass.3
> 
> Ok, so we use the core bypass without persistent memory, and then
> have another bypass mode on top.  Not great, but I can't think
> of anything better.  Note that your checks for the map_sg case
> aren't very efficient - for one it would make sense to calculate
> the limit only once, 

Good points, I'll post revised version when you post your v3 of this.

> but also it would make sense to reuse the
> calculted diecect mapping addresses instead of doing another pass
> later on in the dma-direct code.

Probably but I wonder what kind of hardware we need to see the
difference. I might try, just need to ride to the office to plug the
cable in my 100GBit eth machines :) Thanks,


-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

2020-04-07 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker 
> Sent: Friday, April 3, 2020 4:23 PM
> To: Auger Eric 
> userspace
> 
> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote:
> > >>> header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > >>>VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 
> > >>> 1);
> @@ -2254,6 +2309,7
> > >>> @@ static int vfio_iommu_info_add_nesting_cap(struct
> > >> vfio_iommu *iommu,
> > >>> /* nesting iommu type supports PASID requests 
> > >>> (alloc/free) */
> > >>> nesting_cap->nesting_capabilities |= 
> > >>> VFIO_IOMMU_PASID_REQS;
> > >> What is the meaning for ARM?
> > >
> > > I think it's just a software capability exposed to userspace, on
> > > userspace side, it has a choice to use it or not. :-) The reason
> > > define it and report it in cap nesting is that I'd like to make the
> > > pasid alloc/free be available just for IOMMU with type
> > > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good
> > > for ARM. We can find a proper way to report the availability.
> >
> > Well it is more a question for jean-Philippe. Do we have a system wide
> > PASID allocation on ARM?
> 
> We don't, the PASID spaces are per-VM on Arm, so this function should consult 
> the
> IOMMU driver before setting flags. As you said on patch 3, nested doesn't
> necessarily imply PASID support. The SMMUv2 does not support PASID but does
> support nesting stages 1 and 2 for the IOVA space.
> SMMUv3 support of PASID depends on HW capabilities. So I think this needs to 
> be
> finer grained:
> 
> Does the container support:
> * VFIO_IOMMU_PASID_REQUEST?
>   -> Yes for VT-d 3
>   -> No for Arm SMMU
> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL?
>   -> Yes for VT-d 3
>   -> Sometimes for SMMUv2
>   -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to
>  PASID tables being in GPA space.)
> * VFIO_IOMMU_BIND_PASID_TABLE?
>   -> No for VT-d
>   -> Sometimes for SMMUv3
> 
> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support.

good summary. do you expect to see any 

> 
> > >>> +   nesting_cap->stage1_formats = formats;
> > >> as spotted by Kevin, since a single format is supported, rename
> > >
> > > ok, I was believing it may be possible on ARM or so. :-) will rename
> > > it.
> 
> Yes I don't think an u32 is going to cut it for Arm :( We need to describe 
> all sorts of
> capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, 
> HW
> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I
> guess we could have a secondary vendor capability for these?

Actually, I'm wondering if we can define some formats to stands for a set of
capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level
page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse
the capabilities.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu