Re: [PATCH v6 5/8] iommu: Add bounce page APIs
On Fri, Aug 16, 2019 at 10:45:13AM +0800, Lu Baolu wrote: > Okay. I understand that adding these APIs in iommu.c is not a good idea. > And, I also don't think merging the bounce buffer implementation into > iommu_map() is feasible since iommu_map() is not DMA API centric. > > The bounce buffer implementation will eventually be part of DMA APIs > defined in dma-iommu.c, but currently those APIs are not ready for x86 > use yet. So I will put them in iommu/vt-d driver for this time being and > will move them to dma-iommu.c later. I think they are more or less ready actually, we just need more people reviewing the conversions. Tom just reposted the AMD one which will need a few more reviews, and he has an older patchset for intel-iommu as well that could use a some more eyes.
Re: [PATCH v6 5/8] iommu: Add bounce page APIs
Hi Joerg, On 8/15/19 11:48 PM, Joerg Roedel wrote: On Thu, Aug 15, 2019 at 02:15:32PM +0800, Lu Baolu wrote: iommu_map/unmap() APIs haven't parameters for dma direction and attributions. These parameters are elementary for DMA APIs. Say, after map, if the dma direction is TO_DEVICE and a bounce buffer is used, we must sync the data from the original dma buffer to the bounce buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap, we need to sync the data from the bounce buffer onto the original buffer. The DMA direction from DMA-API maps to the protections in iommu_map(): DMA_FROM_DEVICE:IOMMU_WRITE DMA_TO_DEVICE: IOMMU_READ DMA_BIDIRECTIONAL IOMMU_READ | IOMMU_WRITE And for the sync DMA-API also has separate functions for either direction. So I don't see why these extra functions are needed in the IOMMU-API. Okay. I understand that adding these APIs in iommu.c is not a good idea. And, I also don't think merging the bounce buffer implementation into iommu_map() is feasible since iommu_map() is not DMA API centric. The bounce buffer implementation will eventually be part of DMA APIs defined in dma-iommu.c, but currently those APIs are not ready for x86 use yet. So I will put them in iommu/vt-d driver for this time being and will move them to dma-iommu.c later. Does this work for you? Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM
On Fri, 16 Aug 2019 00:17:44 +0300 Andy Shevchenko wrote: > On Thu, Aug 15, 2019 at 11:52 PM Jacob Pan > wrote: > > > > Use combined macros for_each_svm_dev() to simplify SVM device > > iteration and error checking. > > > > Suggested-by: Andy Shevchenko > > Signed-off-by: Jacob Pan > > Reviewed-by: Eric Auger > > --- > > drivers/iommu/intel-svm.c | 85 > > +++ 1 file changed, 41 > > insertions(+), 44 deletions(-) > > > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > > index 5a688a5..ea6f2e2 100644 > > --- a/drivers/iommu/intel-svm.c > > +++ b/drivers/iommu/intel-svm.c > > @@ -218,6 +218,10 @@ static const struct mmu_notifier_ops > > intel_mmuops = { static DEFINE_MUTEX(pasid_mutex); > > static LIST_HEAD(global_svm_list); > > > > +#define for_each_svm_dev(svm, dev) \ > > + list_for_each_entry(sdev, >devs, list) \ > > > + if (dev == sdev->dev) \ > > This should be > if (dev != sdev->dev) {} else > and no trailing \ is neeeded. > > The rationale of above form to avoid > for_each_foo() { > } else { > ...WTF?!.. > } > I understand, but until we have the else {} case we don't have anything to avoid. The current code only has a simple positive logic. > > + > > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, > > struct svm_dev_ops *ops) { > > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > > @@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int > > *pasid, int flags, struct svm_dev_ goto out; > > } > > > > - list_for_each_entry(sdev, >devs, list) > > { > > - if (dev == sdev->dev) { > > - if (sdev->ops != ops) { > > - ret = -EBUSY; > > - goto out; > > - } > > - sdev->users++; > > - goto success; > > + for_each_svm_dev(svm, dev) { > > + if (sdev->ops != ops) { > > + ret = -EBUSY; > > + goto out; > > } > > + sdev->users++; > > + goto success; > > } > > > > break; > > @@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev, > > int pasid) goto out; > > > > svm = ioasid_find(NULL, pasid, NULL); > > - if (IS_ERR(svm)) { > > + if (IS_ERR_OR_NULL(svm)) { > > ret = PTR_ERR(svm); > > goto out; > > } > > > > - if (!svm) > > - goto out; > > + for_each_svm_dev(svm, dev) { > > + ret = 0; > > + sdev->users--; > > + if (!sdev->users) { > > + list_del_rcu(>list); > > + /* Flush the PASID cache and IOTLB for this > > device. > > +* Note that we do depend on the hardware > > *not* using > > +* the PASID any more. Just as we depend on > > other > > +* devices never using PASIDs that they > > have no right > > +* to use. We have a *shared* PASID table, > > because it's > > +* large and has to be physically > > contiguous. So it's > > +* hard to be as defensive as we might > > like. */ > > + intel_pasid_tear_down_entry(iommu, dev, > > svm->pasid); > > + intel_flush_svm_range_dev(svm, sdev, 0, -1, > > 0, !svm->mm); > > + kfree_rcu(sdev, rcu); > > + > > + if (list_empty(>devs)) { > > + ioasid_free(svm->pasid); > > + if (svm->mm) > > + > > mmu_notifier_unregister(>notifier, svm->mm); > > > > - list_for_each_entry(sdev, >devs, list) { > > - if (dev == sdev->dev) { > > - ret = 0; > > - sdev->users--; > > - if (!sdev->users) { > > - list_del_rcu(>list); > > - /* Flush the PASID cache and IOTLB > > for this device. > > -* Note that we do depend on the > > hardware *not* using > > -* the PASID any more. Just as we > > depend on other > > -* devices never using PASIDs that > > they have no right > > -* to use. We have a *shared* PASID > > table, because it's > > -* large and has to be physically > >
Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
On 8/15/19 3:20 PM, Bjorn Helgaas wrote: [+cc Joerg, David, iommu list: because IOMMU drivers are the only callers of pci_enable_pri() and pci_enable_pasid()] On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppusw...@linux.intel.com wrote: From: Kuppuswamy Sathyanarayanan When IOMMU tries to enable Page Request Interface (PRI) for VF device in iommu_enable_dev_iotlb(), it always fails because PRI support for PCIe VF device is currently broken. Current implementation expects the given PCIe device (PF & VF) to implement PRI capability before enabling the PRI support. But this assumption is incorrect. As per PCIe spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the PRI of the PF and not implement it. Hence we need to create exception for handling the PRI support for PCIe VF device. Also, since PRI is a shared resource between PF/VF, following rules should apply. 1. Use proper locking before accessing/modifying PF resources in VF PRI enable/disable call. 2. Use reference count logic to track the usage of PRI resource. 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero. Wait, why do we need this at all? I agree the spec says VFs may not implement PRI or PASID capabilities and that VFs use the PRI and PASID of the PF. But why do we need to support pci_enable_pri() and pci_enable_pasid() for VFs? There's nothing interesting we can *do* in the VF, and passing it off to the PF adds all this locking mess. For VFs, can we just make them do nothing or return -EINVAL? What functionality would we be missing if we did that? Currently PRI/PASID capabilities are not enabled by default. IOMMU can enable PRI/PASID for VF first (and not enable it for PF). In this case, doing nothing for VF device will break the functionality. Also the PRI/PASID config options like "PRI Outstanding Page Request Allocation" or "PASID Execute Permission" or "PASID Privileged Mode" are currently configured as per device feature. And hence there is a chance for VF/PF to use different values for these options. (Obviously returning -EINVAL would require tweaks in the callers to either avoid the call for VFs or handle the -EINVAL gracefully.) Cc: Ashok Raj Cc: Keith Busch Suggested-by: Ashok Raj Signed-off-by: Kuppuswamy Sathyanarayanan --- drivers/pci/ats.c | 143 ++-- include/linux/pci.h | 2 + 2 files changed, 112 insertions(+), 33 deletions(-) diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c index 1f4be27a071d..079dc544 100644 --- a/drivers/pci/ats.c +++ b/drivers/pci/ats.c @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev) if (pdev->is_virtfn) return; + mutex_init(>pri_lock); + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); if (!pos) return; @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs) { u16 control, status; u32 max_requests; + int ret = 0; + struct pci_dev *pf = pci_physfn(pdev); - if (WARN_ON(pdev->pri_enabled)) - return -EBUSY; + mutex_lock(>pri_lock); - if (!pdev->pri_cap) - return -EINVAL; + if (WARN_ON(pdev->pri_enabled)) { + ret = -EBUSY; + goto pri_unlock; + } - pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, ); - if (!(status & PCI_PRI_STATUS_STOPPED)) - return -EBUSY; + if (!pf->pri_cap) { + ret = -EINVAL; + goto pri_unlock; + } + + if (pdev->is_virtfn && pf->pri_enabled) + goto update_status; + + /* +* Before updating PRI registers, make sure there is no +* outstanding PRI requests. +*/ + pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, ); + if (!(status & PCI_PRI_STATUS_STOPPED)) { + ret = -EBUSY; + goto pri_unlock; + } - pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ, - _requests); + pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests); reqs = min(max_requests, reqs); - pdev->pri_reqs_alloc = reqs; - pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs); + pf->pri_reqs_alloc = reqs; + pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs); control = PCI_PRI_CTRL_ENABLE; - pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control); + pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control); - pdev->pri_enabled = 1; + /* +* If PRI is not already enabled in PF, increment the PF +* pri_ref_cnt to track the usage of PRI interface. +*/ + if (pdev->is_virtfn && !pf->pri_enabled) { + atomic_inc(>pri_ref_cnt); + pf->pri_enabled = 1; + } - return 0; +update_status: +
Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
[+cc Joerg, David, iommu list: because IOMMU drivers are the only callers of pci_enable_pri() and pci_enable_pasid()] On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppusw...@linux.intel.com wrote: > From: Kuppuswamy Sathyanarayanan > > When IOMMU tries to enable Page Request Interface (PRI) for VF device > in iommu_enable_dev_iotlb(), it always fails because PRI support for > PCIe VF device is currently broken. Current implementation expects > the given PCIe device (PF & VF) to implement PRI capability before > enabling the PRI support. But this assumption is incorrect. As per PCIe > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the > PRI of the PF and not implement it. Hence we need to create exception > for handling the PRI support for PCIe VF device. > > Also, since PRI is a shared resource between PF/VF, following rules > should apply. > > 1. Use proper locking before accessing/modifying PF resources in VF >PRI enable/disable call. > 2. Use reference count logic to track the usage of PRI resource. > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero. Wait, why do we need this at all? I agree the spec says VFs may not implement PRI or PASID capabilities and that VFs use the PRI and PASID of the PF. But why do we need to support pci_enable_pri() and pci_enable_pasid() for VFs? There's nothing interesting we can *do* in the VF, and passing it off to the PF adds all this locking mess. For VFs, can we just make them do nothing or return -EINVAL? What functionality would we be missing if we did that? (Obviously returning -EINVAL would require tweaks in the callers to either avoid the call for VFs or handle the -EINVAL gracefully.) > Cc: Ashok Raj > Cc: Keith Busch > Suggested-by: Ashok Raj > Signed-off-by: Kuppuswamy Sathyanarayanan > > --- > drivers/pci/ats.c | 143 ++-- > include/linux/pci.h | 2 + > 2 files changed, 112 insertions(+), 33 deletions(-) > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index 1f4be27a071d..079dc544 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev) > if (pdev->is_virtfn) > return; > > + mutex_init(>pri_lock); > + > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); > if (!pos) > return; > @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs) > { > u16 control, status; > u32 max_requests; > + int ret = 0; > + struct pci_dev *pf = pci_physfn(pdev); > > - if (WARN_ON(pdev->pri_enabled)) > - return -EBUSY; > + mutex_lock(>pri_lock); > > - if (!pdev->pri_cap) > - return -EINVAL; > + if (WARN_ON(pdev->pri_enabled)) { > + ret = -EBUSY; > + goto pri_unlock; > + } > > - pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, ); > - if (!(status & PCI_PRI_STATUS_STOPPED)) > - return -EBUSY; > + if (!pf->pri_cap) { > + ret = -EINVAL; > + goto pri_unlock; > + } > + > + if (pdev->is_virtfn && pf->pri_enabled) > + goto update_status; > + > + /* > + * Before updating PRI registers, make sure there is no > + * outstanding PRI requests. > + */ > + pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, ); > + if (!(status & PCI_PRI_STATUS_STOPPED)) { > + ret = -EBUSY; > + goto pri_unlock; > + } > > - pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ, > - _requests); > + pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, _requests); > reqs = min(max_requests, reqs); > - pdev->pri_reqs_alloc = reqs; > - pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs); > + pf->pri_reqs_alloc = reqs; > + pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs); > > control = PCI_PRI_CTRL_ENABLE; > - pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control); > + pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control); > > - pdev->pri_enabled = 1; > + /* > + * If PRI is not already enabled in PF, increment the PF > + * pri_ref_cnt to track the usage of PRI interface. > + */ > + if (pdev->is_virtfn && !pf->pri_enabled) { > + atomic_inc(>pri_ref_cnt); > + pf->pri_enabled = 1; > + } > > - return 0; > +update_status: > + atomic_inc(>pri_ref_cnt); > + pdev->pri_enabled = 1; > +pri_unlock: > + mutex_unlock(>pri_lock); > + return ret; > } > EXPORT_SYMBOL_GPL(pci_enable_pri); > > @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri); > void pci_disable_pri(struct pci_dev *pdev) > { > u16 control; > + struct pci_dev *pf = pci_physfn(pdev); > > - if (WARN_ON(!pdev->pri_enabled)) > -
Re: [PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM
On Thu, Aug 15, 2019 at 11:52 PM Jacob Pan wrote: > > Use combined macros for_each_svm_dev() to simplify SVM device iteration > and error checking. > > Suggested-by: Andy Shevchenko > Signed-off-by: Jacob Pan > Reviewed-by: Eric Auger > --- > drivers/iommu/intel-svm.c | 85 > +++ > 1 file changed, 41 insertions(+), 44 deletions(-) > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index 5a688a5..ea6f2e2 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -218,6 +218,10 @@ static const struct mmu_notifier_ops intel_mmuops = { > static DEFINE_MUTEX(pasid_mutex); > static LIST_HEAD(global_svm_list); > > +#define for_each_svm_dev(svm, dev) \ > + list_for_each_entry(sdev, >devs, list) \ > + if (dev == sdev->dev) \ This should be if (dev != sdev->dev) {} else and no trailing \ is neeeded. The rationale of above form to avoid for_each_foo() { } else { ...WTF?!.. } > + > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct > svm_dev_ops *ops) > { > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > @@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, > int flags, struct svm_dev_ > goto out; > } > > - list_for_each_entry(sdev, >devs, list) { > - if (dev == sdev->dev) { > - if (sdev->ops != ops) { > - ret = -EBUSY; > - goto out; > - } > - sdev->users++; > - goto success; > + for_each_svm_dev(svm, dev) { > + if (sdev->ops != ops) { > + ret = -EBUSY; > + goto out; > } > + sdev->users++; > + goto success; > } > > break; > @@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid) > goto out; > > svm = ioasid_find(NULL, pasid, NULL); > - if (IS_ERR(svm)) { > + if (IS_ERR_OR_NULL(svm)) { > ret = PTR_ERR(svm); > goto out; > } > > - if (!svm) > - goto out; > + for_each_svm_dev(svm, dev) { > + ret = 0; > + sdev->users--; > + if (!sdev->users) { > + list_del_rcu(>list); > + /* Flush the PASID cache and IOTLB for this device. > +* Note that we do depend on the hardware *not* using > +* the PASID any more. Just as we depend on other > +* devices never using PASIDs that they have no right > +* to use. We have a *shared* PASID table, because > it's > +* large and has to be physically contiguous. So it's > +* hard to be as defensive as we might like. */ > + intel_pasid_tear_down_entry(iommu, dev, svm->pasid); > + intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, > !svm->mm); > + kfree_rcu(sdev, rcu); > + > + if (list_empty(>devs)) { > + ioasid_free(svm->pasid); > + if (svm->mm) > + > mmu_notifier_unregister(>notifier, svm->mm); > > - list_for_each_entry(sdev, >devs, list) { > - if (dev == sdev->dev) { > - ret = 0; > - sdev->users--; > - if (!sdev->users) { > - list_del_rcu(>list); > - /* Flush the PASID cache and IOTLB for this > device. > -* Note that we do depend on the hardware > *not* using > -* the PASID any more. Just as we depend on > other > -* devices never using PASIDs that they have > no right > -* to use. We have a *shared* PASID table, > because it's > -* large and has to be physically contiguous. > So it's > -* hard to be as defensive as we might like. > */ > - intel_pasid_tear_down_entry(iommu, dev, > svm->pasid); > - intel_flush_svm_range_dev(svm, sdev, 0, -1, > 0, !svm->mm); > - kfree_rcu(sdev, rcu); > - > -
[PATCH v5 07/19] iommu: Add I/O ASID allocator
From: Jean-Philippe Brucker Some devices might support multiple DMA address spaces, in particular those that have the PCI PASID feature. PASID (Process Address Space ID) allows to share process address spaces with devices (SVA), partition a device into VM-assignable entities (VFIO mdev) or simply provide multiple DMA address space to kernel drivers. Add a global PASID allocator usable by different drivers at the same time. Name it I/O ASID to avoid confusion with ASIDs allocated by arch code, which are usually a separate ID space. The IOASID space is global. Each device can have its own PASID space, but by convention the IOMMU ended up having a global PASID space, so that with SVA, each mm_struct is associated to a single PASID. The allocator is primarily used by IOMMU subsystem but in rare occasions drivers would like to allocate PASIDs for devices that aren't managed by an IOMMU, using the same ID space as IOMMU. Signed-off-by: Jean-Philippe Brucker Signed-off-by: Jacob Pan --- drivers/iommu/Kconfig | 4 ++ drivers/iommu/Makefile | 1 + drivers/iommu/ioasid.c | 151 + include/linux/ioasid.h | 47 +++ 4 files changed, 203 insertions(+) create mode 100644 drivers/iommu/ioasid.c create mode 100644 include/linux/ioasid.h diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e15cdcd..0ade8a0 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -3,6 +3,10 @@ config IOMMU_IOVA tristate +# The IOASID library may also be used by non-IOMMU_API users +config IOASID + tristate + # IOMMU_API always gets selected by whoever wants it. config IOMMU_API bool diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index f13f36a..011429e 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o +obj-$(CONFIG_IOASID) += ioasid.o obj-$(CONFIG_IOMMU_IOVA) += iova.o obj-$(CONFIG_OF_IOMMU) += of_iommu.o obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c new file mode 100644 index 000..6fbea76 --- /dev/null +++ b/drivers/iommu/ioasid.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * I/O Address Space ID allocator. There is one global IOASID space, split into + * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and + * free IOASIDs with ioasid_alloc and ioasid_free. + */ +#include +#include +#include +#include +#include + +struct ioasid_data { + ioasid_t id; + struct ioasid_set *set; + void *private; + struct rcu_head rcu; +}; + +static DEFINE_XARRAY_ALLOC(ioasid_xa); + +/** + * ioasid_set_data - Set private data for an allocated ioasid + * @ioasid: the ID to set data + * @data: the private data + * + * For IOASID that is already allocated, private data can be set + * via this API. Future lookup can be done via ioasid_find. + */ +int ioasid_set_data(ioasid_t ioasid, void *data) +{ + struct ioasid_data *ioasid_data; + int ret = 0; + + xa_lock(_xa); + ioasid_data = xa_load(_xa, ioasid); + if (ioasid_data) + rcu_assign_pointer(ioasid_data->private, data); + else + ret = -ENOENT; + xa_unlock(_xa); + + /* +* Wait for readers to stop accessing the old private data, so the +* caller can free it. +*/ + if (!ret) + synchronize_rcu(); + + return ret; +} +EXPORT_SYMBOL_GPL(ioasid_set_data); + +/** + * ioasid_alloc - Allocate an IOASID + * @set: the IOASID set + * @min: the minimum ID (inclusive) + * @max: the maximum ID (inclusive) + * @private: data private to the caller + * + * Allocate an ID between @min and @max. The @private pointer is stored + * internally and can be retrieved with ioasid_find(). + * + * Return: the allocated ID on success, or %INVALID_IOASID on failure. + */ +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, + void *private) +{ + ioasid_t id; + struct ioasid_data *data; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return INVALID_IOASID; + + data->set = set; + data->private = private; + + if (xa_alloc(_xa, , data, XA_LIMIT(min, max), GFP_KERNEL)) { + pr_err("Failed to alloc ioasid from %d to %d\n", min, max); + goto exit_free; + } + data->id = id; + + return id; +exit_free: + kfree(data); + return INVALID_IOASID; +} +EXPORT_SYMBOL_GPL(ioasid_alloc); + +/** + * ioasid_free - Free an IOASID + * @ioasid: the ID to remove + */ +void ioasid_free(ioasid_t ioasid) +{ + struct ioasid_data *ioasid_data; + + ioasid_data = xa_erase(_xa,
[PATCH v5 09/19] iommu: Introduce guest PASID bind function
Guest shared virtual address (SVA) may require host to shadow guest PASID tables. Guest PASID can also be allocated from the host via enlightened interfaces. In this case, guest needs to bind the guest mm, i.e. cr3 in guest physical address to the actual PASID table in the host IOMMU. Nesting will be turned on such that guest virtual address can go through a two level translation: - 1st level translates GVA to GPA - 2nd level translates GPA to HPA This patch introduces APIs to bind guest PASID data to the assigned device entry in the physical IOMMU. See the diagram below for usage explanation. .-. .---. | vIOMMU| | Guest process mm, FL only | | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | | GP '-' Guest --| Shadow |--- GP->HP* - vv | Host v .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.-. | | |Set SL to GPA-HPA| | | '-' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables - GP = Guest PASID - HP = Host PASID * Conversion needed if non-identity GP-HP mapping option is chosen. Signed-off-by: Jacob Pan Signed-off-by: Liu Yi L --- drivers/iommu/iommu.c | 20 include/linux/iommu.h | 22 ++ include/uapi/linux/iommu.h | 58 ++ 3 files changed, 100 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6228d5d..c19ea1f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1729,6 +1729,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, } EXPORT_SYMBOL_GPL(iommu_cache_invalidate); +int iommu_sva_bind_gpasid(struct iommu_domain *domain, + struct device *dev, struct iommu_gpasid_bind_data *data) +{ + if (unlikely(!domain->ops->sva_bind_gpasid)) + return -ENODEV; + + return domain->ops->sva_bind_gpasid(domain, dev, data); +} +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid); + +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, +ioasid_t pasid) +{ + if (unlikely(!domain->ops->sva_unbind_gpasid)) + return -ENODEV; + + return domain->ops->sva_unbind_gpasid(dev, pasid); +} +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 28f1a8c..91370e7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #define IOMMU_READ (1 << 0) @@ -232,6 +233,8 @@ struct iommu_sva_ops { * @detach_pasid_table: detach the pasid table * @cache_invalidate: invalidate translation caches * @pgsize_bitmap: bitmap of all possible supported page sizes + * @sva_bind_gpasid: bind guest pasid and mm + * @sva_unbind_gpasid: unbind guest pasid and mm */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -299,6 +302,10 @@ struct iommu_ops { struct iommu_page_response *msg); int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev, struct iommu_cache_invalidate_info *inv_info); + int (*sva_bind_gpasid)(struct iommu_domain *domain, + struct device *dev, struct iommu_gpasid_bind_data *data); + + int (*sva_unbind_gpasid)(struct device *dev, int pasid); unsigned long pgsize_bitmap; }; @@ -413,6 +420,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain); extern int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, struct iommu_cache_invalidate_info *inv_info); +extern int iommu_sva_bind_gpasid(struct iommu_domain *domain, + struct device *dev, struct iommu_gpasid_bind_data *data); +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain, + struct device *dev, ioasid_t pasid); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -972,6 +983,17 @@
[PATCH v5 08/19] iommu/ioasid: Add custom allocators
Custom IOASID allocators can be registered at runtime and take precedence over the default XArray allocator. They have these attributes: - provides platform specific alloc()/free() functions with private data. - allocation results lookup are not provided by the allocator, lookup request must be done by the IOASID framework by its own XArray. - allocators can be unregistered at runtime, either fallback to the next custom allocator or to the default allocator. - custom allocators can share the same set of alloc()/free() helpers, in this case they also share the same IOASID space, thus the same XArray. - switching between allocators requires all outstanding IOASIDs to be freed unless the two allocators share the same alloc()/free() helpers. Signed-off-by: Jean-Philippe Brucker Signed-off-by: Jacob Pan Link: https://lkml.org/lkml/2019/4/26/462 --- drivers/iommu/ioasid.c | 302 +++-- include/linux/ioasid.h | 28 + 2 files changed, 320 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c index 6fbea76..b2d9ea5 100644 --- a/drivers/iommu/ioasid.c +++ b/drivers/iommu/ioasid.c @@ -17,7 +17,254 @@ struct ioasid_data { struct rcu_head rcu; }; -static DEFINE_XARRAY_ALLOC(ioasid_xa); +/* + * struct ioasid_allocator_data - Internal data structure to hold information + * about an allocator. There are two types of allocators: + * + * - Default allocator always has its own XArray to track the IOASIDs allocated. + * - Custom allocators may share allocation helpers with different private data. + * Custom allocators share the same helper functions also share the same + * XArray. + * Rules: + * 1. Default allocator is always available, not dynamically registered. This is + *to prevent race conditions with early boot code that want to register + *custom allocators or allocate IOASIDs. + * 2. Custom allocators take precedence over the default allocator. + * 3. When all custom allocators sharing the same helper functions are + *unregistered (e.g. due to hotplug), all outstanding IOASIDs must be + *freed. + * 4. When switching between custom allocators sharing the same helper + *functions, outstanding IOASIDs are preserved. + * 5. When switching between custom allocator and default allocator, all IOASIDs + *must be freed to ensure unadulterated space for the new allocator. + * + * @ops: allocator helper functions and its data + * @list: registered custom allocators + * @slist: allocators share the same ops but different data + * @flags: attributes of the allocator + * @xa xarray holds the IOASID space + * @users number of allocators sharing the same ops and XArray + */ +struct ioasid_allocator_data { + struct ioasid_allocator_ops *ops; + struct list_head list; + struct list_head slist; +#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */ + unsigned long flags; + struct xarray xa; + refcount_t users; +}; + +static DEFINE_MUTEX(ioasid_allocator_lock); +static LIST_HEAD(allocators_list); + +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque); +static void default_free(ioasid_t ioasid, void *opaque); + +static struct ioasid_allocator_ops default_ops = { + .alloc = default_alloc, + .free = default_free, +}; + +static struct ioasid_allocator_data default_allocator = { + .ops = _ops, + .flags = 0, + .xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC), +}; + +static struct ioasid_allocator_data *active_allocator = _allocator; + +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque) +{ + ioasid_t id; + + if (xa_alloc(_allocator.xa, , opaque, XA_LIMIT(min, max), GFP_KERNEL)) { + pr_err("Failed to alloc ioasid from %d to %d\n", min, max); + return INVALID_IOASID; + } + + return id; +} + +static void default_free(ioasid_t ioasid, void *opaque) +{ + struct ioasid_data *ioasid_data; + + ioasid_data = xa_erase(_allocator.xa, ioasid); + kfree_rcu(ioasid_data, rcu); +} + +/* Allocate and initialize a new custom allocator with its helper functions */ +static struct ioasid_allocator_data *ioasid_alloc_allocator(struct ioasid_allocator_ops *ops) +{ + struct ioasid_allocator_data *ia_data; + + ia_data = kzalloc(sizeof(*ia_data), GFP_KERNEL); + if (!ia_data) + return NULL; + + xa_init_flags(_data->xa, XA_FLAGS_ALLOC); + INIT_LIST_HEAD(_data->slist); + ia_data->flags |= IOASID_ALLOCATOR_CUSTOM; + ia_data->ops = ops; + + /* For tracking custom allocators that share the same ops */ + list_add_tail(>list, _data->slist); + refcount_set(_data->users, 1); + + return ia_data; +} + +static bool use_same_ops(struct ioasid_allocator_ops *a, struct ioasid_allocator_ops *b) +{ + return (a->free == b->free) &&
[PATCH v5 11/19] iommu/vt-d: Add custom allocator for IOASID
When VT-d driver runs in the guest, PASID allocation must be performed via virtual command interface. This patch registers a custom IOASID allocator which takes precedence over the default XArray based allocator. The resulting IOASID allocation will always come from the host. This ensures that PASID namespace is system- wide. Signed-off-by: Lu Baolu Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan --- drivers/iommu/Kconfig | 1 + drivers/iommu/intel-iommu.c | 67 + include/linux/intel-iommu.h | 2 ++ 3 files changed, 70 insertions(+) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0ade8a0..d5ca821 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -210,6 +210,7 @@ config INTEL_IOMMU_SVM bool "Support for Shared Virtual Memory with Intel IOMMU" depends on INTEL_IOMMU && X86 select PCI_PASID + select IOASID select MMU_NOTIFIER help Shared Virtual Memory (SVM) provides a facility for devices diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bdaed2d..b15ec58 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1693,6 +1693,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu) if (ecap_prs(iommu->ecap)) intel_svm_finish_prq(iommu); } + ioasid_unregister_allocator(>pasid_allocator); + #endif } @@ -4619,6 +4621,46 @@ static int __init probe_acpi_namespace_devices(void) return 0; } +#ifdef CONFIG_INTEL_IOMMU_SVM +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data) +{ + struct intel_iommu *iommu = data; + ioasid_t ioasid; + + /* +* VT-d virtual command interface always uses the full 20 bit +* PASID range. Host can partition guest PASID range based on +* policies but it is out of guest's control. +*/ + if (min < PASID_MIN || max > PASID_MAX) + return INVALID_IOASID; + + if (vcmd_alloc_pasid(iommu, )) + return INVALID_IOASID; + + return ioasid; +} + +static void intel_ioasid_free(ioasid_t ioasid, void *data) +{ + struct iommu_pasid_alloc_info *svm; + struct intel_iommu *iommu = data; + + if (!iommu) + return; + /* +* Sanity check the ioasid owner is done at upper layer, e.g. VFIO +* We can only free the PASID when all the devices are unbond. +*/ + svm = ioasid_find(NULL, ioasid, NULL); + if (!svm) { + pr_warn("Freeing unbond IOASID %d\n", ioasid); + return; + } + vcmd_free_pasid(iommu, ioasid); +} +#endif + int __init intel_iommu_init(void) { int ret = -ENODEV; @@ -4722,6 +4764,31 @@ int __init intel_iommu_init(void) "%s", iommu->name); iommu_device_set_ops(>iommu, _iommu_ops); iommu_device_register(>iommu); +#ifdef CONFIG_INTEL_IOMMU_SVM + if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) { + /* +* Register a custom ASID allocator if we are running +* in a guest, the purpose is to have a system wide PASID +* namespace among all PASID users. +* There can be multiple vIOMMUs in each guest but only +* one allocator is active. All vIOMMU allocators will +* eventually be calling the same host allocator. +*/ + iommu->pasid_allocator.alloc = intel_ioasid_alloc; + iommu->pasid_allocator.free = intel_ioasid_free; + iommu->pasid_allocator.pdata = (void *)iommu; + ret = ioasid_register_allocator(>pasid_allocator); + if (ret) { + pr_warn("Custom PASID allocator registeration failed\n"); + /* +* Disable scalable mode on this IOMMU if there +* is no custom allocator. Mixing SM capable vIOMMU +* and non-SM vIOMMU are not supported. +*/ + intel_iommu_sm = 0; + } + } +#endif } bus_set_iommu(_bus_type, _iommu_ops); diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 37fb0c9..80318c5 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -543,6 +544,7 @@ struct intel_iommu { #ifdef CONFIG_INTEL_IOMMU_SVM struct page_req_dsc *prq; unsigned char prq_name[16];/* Name for PRQ interrupt */ + struct
[PATCH v5 14/19] iommu/vt-d: Avoid duplicated code for PASID setup
After each setup for PASID entry, related translation caches must be flushed. We can combine duplicated code into one function which is less error prone. Signed-off-by: Jacob Pan --- drivers/iommu/intel-pasid.c | 48 + 1 file changed, 18 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index c0d1f28..9c5affc 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -512,6 +512,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, devtlb_invalidation_with_pasid(iommu, dev, pasid); } +static void pasid_flush_caches(struct intel_iommu *iommu, + struct pasid_entry *pte, + int pasid, u16 did) +{ + if (!ecap_coherent(iommu->ecap)) + clflush_cache_range(pte, sizeof(*pte)); + + if (cap_caching_mode(iommu->cap)) { + pasid_cache_invalidation_with_pasid(iommu, did, pasid); + iotlb_invalidation_with_pasid(iommu, did, pasid); + } else { + iommu_flush_write_buffer(iommu); + } +} + /* * Set up the scalable mode pasid table entry for first only * translation type. @@ -557,16 +572,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu, /* Setup Present and PASID Granular Transfer Type: */ pasid_set_translation_type(pte, 1); pasid_set_present(pte); - - if (!ecap_coherent(iommu->ecap)) - clflush_cache_range(pte, sizeof(*pte)); - - if (cap_caching_mode(iommu->cap)) { - pasid_cache_invalidation_with_pasid(iommu, did, pasid); - iotlb_invalidation_with_pasid(iommu, did, pasid); - } else { - iommu_flush_write_buffer(iommu); - } + pasid_flush_caches(iommu, pte, pasid, did); return 0; } @@ -630,16 +636,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu, */ pasid_set_sre(pte); pasid_set_present(pte); - - if (!ecap_coherent(iommu->ecap)) - clflush_cache_range(pte, sizeof(*pte)); - - if (cap_caching_mode(iommu->cap)) { - pasid_cache_invalidation_with_pasid(iommu, did, pasid); - iotlb_invalidation_with_pasid(iommu, did, pasid); - } else { - iommu_flush_write_buffer(iommu); - } + pasid_flush_caches(iommu, pte, pasid, did); return 0; } @@ -673,16 +670,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu, */ pasid_set_sre(pte); pasid_set_present(pte); - - if (!ecap_coherent(iommu->ecap)) - clflush_cache_range(pte, sizeof(*pte)); - - if (cap_caching_mode(iommu->cap)) { - pasid_cache_invalidation_with_pasid(iommu, did, pasid); - iotlb_invalidation_with_pasid(iommu, did, pasid); - } else { - iommu_flush_write_buffer(iommu); - } + pasid_flush_caches(iommu, pte, pasid, did); return 0; } -- 2.7.4
[PATCH v5 12/19] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
Make use of generic IOASID code to manage PASID allocation, free, and lookup. Replace Intel specific code. Signed-off-by: Jacob Pan --- drivers/iommu/intel-iommu.c | 12 ++-- drivers/iommu/intel-pasid.c | 36 drivers/iommu/intel-svm.c | 37 + 3 files changed, 27 insertions(+), 58 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index b15ec58..96defc3 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4989,7 +4989,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain, domain->auxd_refcnt--; if (!domain->auxd_refcnt && domain->default_pasid > 0) - intel_pasid_free_id(domain->default_pasid); + ioasid_free(domain->default_pasid); } static int aux_domain_add_dev(struct dmar_domain *domain, @@ -5007,10 +5007,10 @@ static int aux_domain_add_dev(struct dmar_domain *domain, if (domain->default_pasid <= 0) { int pasid; - pasid = intel_pasid_alloc_id(domain, PASID_MIN, -pci_max_pasids(to_pci_dev(dev)), -GFP_KERNEL); - if (pasid <= 0) { + /* No private data needed for the default pasid */ + pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1, + NULL); + if (pasid == INVALID_IOASID) { pr_err("Can't allocate default pasid\n"); return -ENODEV; } @@ -5046,7 +5046,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain, spin_unlock(>lock); spin_unlock_irqrestore(_domain_lock, flags); if (!domain->auxd_refcnt && domain->default_pasid > 0) - intel_pasid_free_id(domain->default_pasid); + ioasid_free(domain->default_pasid); return ret; } diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 76bcbb2..c0d1f28 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -26,42 +26,6 @@ */ static DEFINE_SPINLOCK(pasid_lock); u32 intel_pasid_max_id = PASID_MAX; -static DEFINE_IDR(pasid_idr); - -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp) -{ - int ret, min, max; - - min = max_t(int, start, PASID_MIN); - max = min_t(int, end, intel_pasid_max_id); - - WARN_ON(in_interrupt()); - idr_preload(gfp); - spin_lock(_lock); - ret = idr_alloc(_idr, ptr, min, max, GFP_ATOMIC); - spin_unlock(_lock); - idr_preload_end(); - - return ret; -} - -void intel_pasid_free_id(int pasid) -{ - spin_lock(_lock); - idr_remove(_idr, pasid); - spin_unlock(_lock); -} - -void *intel_pasid_lookup_id(int pasid) -{ - void *p; - - spin_lock(_lock); - p = idr_find(_idr, pasid); - spin_unlock(_lock); - - return p; -} static int check_vcmd_pasid(struct intel_iommu *iommu) { diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 780de0c..5a688a5 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include "intel-pasid.h" @@ -324,16 +325,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ if (pasid_max > intel_pasid_max_id) pasid_max = intel_pasid_max_id; - /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ - ret = intel_pasid_alloc_id(svm, - !!cap_caching_mode(iommu->cap), - pasid_max - 1, GFP_KERNEL); - if (ret < 0) { + /* Do not use PASID 0, reserved for RID to PASID */ + svm->pasid = ioasid_alloc(NULL, PASID_MIN, + pasid_max - 1, svm); + if (svm->pasid == INVALID_IOASID) { kfree(svm); kfree(sdev); + ret = ENOSPC; goto out; } - svm->pasid = ret; svm->notifier.ops = _mmuops; svm->mm = mm; svm->flags = flags; @@ -343,7 +343,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ if (mm) { ret = mmu_notifier_register(>notifier, mm); if (ret) { - intel_pasid_free_id(svm->pasid); + ioasid_free(svm->pasid); kfree(svm); kfree(sdev); goto out; @@ -359,7 +359,7 @@ int intel_svm_bind_mm(struct device *dev, int
[PATCH v5 16/19] iommu/vt-d: Misc macro clean up for SVM
Use combined macros for_each_svm_dev() to simplify SVM device iteration and error checking. Suggested-by: Andy Shevchenko Signed-off-by: Jacob Pan Reviewed-by: Eric Auger --- drivers/iommu/intel-svm.c | 85 +++ 1 file changed, 41 insertions(+), 44 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 5a688a5..ea6f2e2 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -218,6 +218,10 @@ static const struct mmu_notifier_ops intel_mmuops = { static DEFINE_MUTEX(pasid_mutex); static LIST_HEAD(global_svm_list); +#define for_each_svm_dev(svm, dev) \ + list_for_each_entry(sdev, >devs, list) \ + if (dev == sdev->dev) \ + int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops) { struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); @@ -263,15 +267,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ goto out; } - list_for_each_entry(sdev, >devs, list) { - if (dev == sdev->dev) { - if (sdev->ops != ops) { - ret = -EBUSY; - goto out; - } - sdev->users++; - goto success; + for_each_svm_dev(svm, dev) { + if (sdev->ops != ops) { + ret = -EBUSY; + goto out; } + sdev->users++; + goto success; } break; @@ -408,48 +410,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid) goto out; svm = ioasid_find(NULL, pasid, NULL); - if (IS_ERR(svm)) { + if (IS_ERR_OR_NULL(svm)) { ret = PTR_ERR(svm); goto out; } - if (!svm) - goto out; + for_each_svm_dev(svm, dev) { + ret = 0; + sdev->users--; + if (!sdev->users) { + list_del_rcu(>list); + /* Flush the PASID cache and IOTLB for this device. +* Note that we do depend on the hardware *not* using +* the PASID any more. Just as we depend on other +* devices never using PASIDs that they have no right +* to use. We have a *shared* PASID table, because it's +* large and has to be physically contiguous. So it's +* hard to be as defensive as we might like. */ + intel_pasid_tear_down_entry(iommu, dev, svm->pasid); + intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm); + kfree_rcu(sdev, rcu); + + if (list_empty(>devs)) { + ioasid_free(svm->pasid); + if (svm->mm) + mmu_notifier_unregister(>notifier, svm->mm); - list_for_each_entry(sdev, >devs, list) { - if (dev == sdev->dev) { - ret = 0; - sdev->users--; - if (!sdev->users) { - list_del_rcu(>list); - /* Flush the PASID cache and IOTLB for this device. -* Note that we do depend on the hardware *not* using -* the PASID any more. Just as we depend on other -* devices never using PASIDs that they have no right -* to use. We have a *shared* PASID table, because it's -* large and has to be physically contiguous. So it's -* hard to be as defensive as we might like. */ - intel_pasid_tear_down_entry(iommu, dev, svm->pasid); - intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm); - kfree_rcu(sdev, rcu); - - if (list_empty(>devs)) { - ioasid_free(svm->pasid); - if (svm->mm) - mmu_notifier_unregister(>notifier, svm->mm); - - list_del(>list); - - /* We mandate that no page faults may be outstanding -
[PATCH v5 03/19] trace/iommu: Add sva trace events
From: Jean-Philippe Brucker For development only, trace I/O page faults and responses. Signed-off-by: Jacob Pan [JPB: removed the invalidate trace event, that will be added later] Signed-off-by: Jean-Philippe Brucker Signed-off-by: Jacob Pan --- include/trace/events/iommu.h | 84 1 file changed, 84 insertions(+) diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h index 72b4582..767b92c 100644 --- a/include/trace/events/iommu.h +++ b/include/trace/events/iommu.h @@ -12,6 +12,8 @@ #define _TRACE_IOMMU_H #include +#include +#include struct device; @@ -161,6 +163,88 @@ DEFINE_EVENT(iommu_error, io_page_fault, TP_ARGS(dev, iova, flags) ); + +TRACE_EVENT(dev_fault, + + TP_PROTO(struct device *dev, struct iommu_fault *evt), + + TP_ARGS(dev, evt), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __field(int, type) + __field(int, reason) + __field(u64, addr) + __field(u64, fetch_addr) + __field(u32, pasid) + __field(u32, grpid) + __field(u32, flags) + __field(u32, prot) + ), + + TP_fast_assign( + __assign_str(device, dev_name(dev)); + __entry->type = evt->type; + if (evt->type == IOMMU_FAULT_DMA_UNRECOV) { + __entry->reason = evt->event.reason; + __entry->flags = evt->event.flags; + __entry->pasid = evt->event.pasid; + __entry->grpid = 0; + __entry->prot = evt->event.perm; + __entry->addr = evt->event.addr; + __entry->fetch_addr = evt->event.fetch_addr; + } else { + __entry->reason = 0; + __entry->flags = evt->prm.flags; + __entry->pasid = evt->prm.pasid; + __entry->grpid = evt->prm.grpid; + __entry->prot = evt->prm.perm; + __entry->addr = evt->prm.addr; + __entry->fetch_addr = 0; + } + ), + + TP_printk("IOMMU:%s type=%d reason=%d addr=0x%016llx fetch=0x%016llx pasid=%d group=%d flags=%x prot=%d", + __get_str(device), + __entry->type, + __entry->reason, + __entry->addr, + __entry->fetch_addr, + __entry->pasid, + __entry->grpid, + __entry->flags, + __entry->prot + ) +); + +TRACE_EVENT(dev_page_response, + + TP_PROTO(struct device *dev, struct iommu_fault_page_response *msg), + + TP_ARGS(dev, msg), + + TP_STRUCT__entry( + __string(device, dev_name(dev)) + __field(int, code) + __field(u32, pasid) + __field(u32, grpid) + ), + + TP_fast_assign( + __assign_str(device, dev_name(dev)); + __entry->code = msg->code; + __entry->pasid = msg->pasid; + __entry->grpid = msg->grpid; + ), + + TP_printk("IOMMU:%s code=%d pasid=%d group=%d", + __get_str(device), + __entry->code, + __entry->pasid, + __entry->grpid + ) +); + #endif /* _TRACE_IOMMU_H */ /* This part must be outside protection */ -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 02/19] iommu: handle page response timeout
When IO page faults are reported outside IOMMU subsystem, the page request handler may fail for various reasons. E.g. a guest received page requests but did not have a chance to run for a long time. The irresponsive behavior could hold off limited resources on the pending device. There can be hardware or credit based software solutions as suggested in the PCI ATS Ch-4. To provide a basic safty net this patch introduces a per device deferrable timer which monitors the longest pending page fault that requires a response. Proper action such as sending failure response code could be taken when timer expires but not included in this patch. We need to consider the life cycle of page groupd ID to prevent confusion with reused group ID by a device. For now, a warning message provides clue of such failure. Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj --- drivers/iommu/iommu.c | 55 +++ include/linux/iommu.h | 4 2 files changed, 59 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5b26499..8f2c7d5 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -909,6 +909,39 @@ int iommu_group_unregister_notifier(struct iommu_group *group, } EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); +static void iommu_dev_fault_timer_fn(struct timer_list *t) +{ + struct iommu_fault_param *fparam = from_timer(fparam, t, timer); + struct iommu_fault_event *evt; + struct iommu_fault_page_request *prm; + + u64 now; + + now = get_jiffies_64(); + + /* The goal is to ensure driver or guest page fault handler(via vfio) +* send page response on time. Otherwise, limited queue resources +* may be occupied by some irresponsive guests or drivers. +* When per device pending fault list is not empty, we periodically checks +* if any anticipated page response time has expired. +* +* TODO: +* We could do the following if response time expires: +* 1. send page response code FAILURE to all pending PRQ +* 2. inform device driver or vfio +* 3. drain in-flight page requests and responses for this device +* 4. clear pending fault list such that driver can unregister fault +*handler(otherwise blocked when pending faults are present). +*/ + list_for_each_entry(evt, >faults, list) { + prm = >fault.prm; + if (time_after64(now, evt->expire)) + pr_err("Page response time expired!, pasid %d gid %d exp %llu now %llu\n", + prm->pasid, prm->grpid, evt->expire, now); + } + mod_timer(t, now + prq_timeout); +} + /** * iommu_register_device_fault_handler() - Register a device fault handler * @dev: the device @@ -956,6 +989,9 @@ int iommu_register_device_fault_handler(struct device *dev, mutex_init(>fault_param->lock); INIT_LIST_HEAD(>fault_param->faults); + if (prq_timeout) + timer_setup(>fault_param->timer, iommu_dev_fault_timer_fn, + TIMER_DEFERRABLE); done_unlock: mutex_unlock(>lock); @@ -1017,7 +1053,9 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt) struct iommu_param *param = dev->iommu_param; struct iommu_fault_event *evt_pending = NULL; struct iommu_fault_param *fparam; + struct timer_list *tmr; int ret = 0; + u64 exp; if (!param || !evt) return -EINVAL; @@ -1038,7 +1076,17 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt) ret = -ENOMEM; goto done_unlock; } + /* Keep track of response expiration time */ + exp = get_jiffies_64() + prq_timeout; + evt_pending->expire = exp; mutex_lock(>lock); + if (list_empty(>faults)) { + /* First pending event, start timer */ + tmr = >iommu_param->fault_param->timer; + WARN_ON(timer_pending(tmr)); + mod_timer(tmr, exp); + } + list_add_tail(_pending->list, >faults); mutex_unlock(>lock); } @@ -1103,6 +1151,13 @@ int iommu_page_response(struct device *dev, break; } + /* stop response timer if no more pending request */ + if (list_empty(>fault_param->faults) && + timer_pending(>fault_param->timer)) { + pr_debug("no pending PRQ, stop timer\n"); + del_timer(>fault_param->timer); + } + done_unlock: mutex_unlock(>fault_param->lock); return ret; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fdc355c..39d371b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -317,10
[PATCH v5 19/19] iommu/vt-d: Add svm/sva invalidate function
When Shared Virtual Address (SVA) is enabled for a guest OS via vIOMMU, we need to provide invalidation support at IOMMU API and driver level. This patch adds Intel VT-d specific function to implement iommu passdown invalidate API for shared virtual address. The use case is for supporting caching structure invalidation of assigned SVM capable devices. Emulated IOMMU exposes queue invalidation capability and passes down all descriptors from the guest to the physical IOMMU. The assumption is that guest to host device ID mapping should be resolved prior to calling IOMMU driver. Based on the device handle, host IOMMU driver can replace certain fields before submit to the invalidation queue. Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj Signed-off-by: Liu, Yi L --- drivers/iommu/intel-iommu.c | 170 1 file changed, 170 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index dcac964..b7ca33a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5169,6 +5169,175 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain, aux_domain_remove_dev(to_dmar_domain(domain), dev); } +/* + * 2D array for converting and sanitizing IOMMU generic TLB granularity to + * VT-d granularity. Invalidation is typically included in the unmap operation + * as a result of DMA or VFIO unmap. However, for assigned device where guest + * could own the first level page tables without being shadowed by QEMU. In + * this case there is no pass down unmap to the host IOMMU as a result of unmap + * in the guest. Only invalidations are trapped and passed down. + * In all cases, only first level TLB invalidation (request with PASID) can be + * passed down, therefore we do not include IOTLB granularity for request + * without PASID (second level). + * + * For an example, to find the VT-d granularity encoding for IOTLB + * type and page selective granularity within PASID: + * X: indexed by iommu cache type + * Y: indexed by enum iommu_inv_granularity + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] + * + * Granu_map array indicates validity of the table. 1: valid, 0: invalid + * + */ +const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { + /* PASID based IOTLB, support PASID selective and page selective */ + {0, 1, 1}, + /* PASID based dev TLBs, only support all PASIDs or single PASID */ + {1, 1, 0}, + /* PASID cache */ + {1, 1, 0} +}; + +const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { + /* PASID based IOTLB */ + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, + /* PASID based dev TLBs */ + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, + /* PASID cache */ + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, +}; + +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu) +{ + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR || + !inv_type_granu_map[type][granu]) + return -EINVAL; + + *vtd_granu = inv_type_granu_table[type][granu]; + + return 0; +} + +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) +{ + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; + + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. +* IOMMU cache invalidate API passes granu_size in bytes, and number of +* granu size in contiguous memory. +*/ + return order_base_2(nr_pages); +} + +#ifdef CONFIG_INTEL_IOMMU_SVM +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct iommu_cache_invalidate_info *inv_info) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct device_domain_info *info; + struct intel_iommu *iommu; + unsigned long flags; + int cache_type; + u8 bus, devfn; + u16 did, sid; + int ret = 0; + u64 size; + + if (!inv_info || !dmar_domain || + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) + return -EINVAL; + + if (!dev || !dev_is_pci(dev)) + return -ENODEV; + + iommu = device_to_iommu(dev, , ); + if (!iommu) + return -ENODEV; + + spin_lock_irqsave(_domain_lock, flags); + spin_lock(>lock); + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn); + if (!info) { + ret = -EINVAL; + goto out_unlock; + } + did = dmar_domain->iommu_did[iommu->seq_id]; + sid = PCI_DEVID(bus, devfn); + size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules); + + for_each_set_bit(cache_type, (unsigned long *)_info->cache, IOMMU_CACHE_INV_TYPE_NR) { + u64 granu = 0; + u64 pasid = 0;
[PATCH v5 18/19] iommu/vt-d: Support flushing more translation cache types
When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable IOTLB invalidation may be passed down from outside IOMMU subsystems. This patch adds invalidation functions that can be used for additional translation cache types. Signed-off-by: Jacob Pan --- drivers/iommu/dmar.c| 46 + drivers/iommu/intel-pasid.c | 3 ++- include/linux/intel-iommu.h | 21 + 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 5d0754e..1da4c68 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1345,6 +1345,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, qi_submit_sync(, iommu); } +/* PASID-based IOTLB Invalidate */ +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid, + unsigned int size_order, u64 granu, int ih) +{ + struct qi_desc desc = {.qw2 = 0, .qw3 = 0}; + + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE; + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) | + QI_EIOTLB_AM(size_order); + desc.qw2 = 0; + desc.qw3 = 0; + qi_submit_sync(, iommu); +} + void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, u16 qdep, u64 addr, unsigned mask) { @@ -1368,6 +1383,37 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, qi_submit_sync(, iommu); } +/* PASID-based device IOTLB Invalidate */ +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu) +{ + struct qi_desc desc; + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1); + + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | + QI_DEV_IOTLB_PFSID(pfsid); + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu); + + desc.qw1 |= addr & ~mask; + /* If S bit is 0, we only flush a single page. If S bit is set, +* The least significant zero bit indicates the invalidation address +* range. VT-d spec 6.5.2.6. +* e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB. +*/ + if (size_order) + desc.qw1 |= QI_DEV_EIOTLB_SIZE; + + qi_submit_sync(, iommu); +} + +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid) +{ + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0}; + + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE; + qi_submit_sync(, iommu); +} /* * Disable Queued Invalidation interface. */ diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index fd2c82f..ff7e877 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -518,7 +518,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, { struct qi_desc desc; - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid); + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) | + QI_PC_PASID(pasid) | QI_PC_TYPE; desc.qw1 = 0; desc.qw2 = 0; desc.qw3 = 0; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index d673b39..682eafa1 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -327,7 +327,7 @@ enum { #define QI_IOTLB_GRAN(gran)(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)(((u64)addr) & VTD_PAGE_MASK) #define QI_IOTLB_IH(ih)(((u64)ih) << 6) -#define QI_IOTLB_AM(am)(((u8)am)) +#define QI_IOTLB_AM(am)(((u8)am) & 0x3f) #define QI_CC_FM(fm) (((u64)fm) << 48) #define QI_CC_SID(sid) (((u64)sid) << 32) @@ -345,17 +345,22 @@ enum { #define QI_PC_DID(did) (((u64)did) << 16) #define QI_PC_GRAN(gran) (((u64)gran) << 4) -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0)) -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1)) +/* PASID cache invalidation granu */ +#define QI_PC_ALL_PASIDS 0 +#define QI_PC_PASID_SEL1 #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) #define QI_EIOTLB_GL(gl) (((u64)gl) << 7) #define QI_EIOTLB_IH(ih) (((u64)ih) << 6) -#define QI_EIOTLB_AM(am) (((u64)am)) +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f) #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32) #define QI_EIOTLB_DID(did) (((u64)did) << 16) #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4) +/* QI Dev-IOTLB inv granu */ +#define QI_DEV_IOTLB_GRAN_ALL 1 +#define QI_DEV_IOTLB_GRAN_PASID_SEL0 + #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK) #define QI_DEV_EIOTLB_SIZE
[PATCH v5 17/19] iommu/vt-d: Add bind guest PASID support
When supporting guest SVA with emulated IOMMU, the guest PASID table is shadowed in VMM. Updates to guest vIOMMU PASID table will result in PASID cache flush which will be passed down to the host as bind guest PASID calls. For the SL page tables, it will be harvested from device's default domain (request w/o PASID), or aux domain in case of mediated device. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | | |SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables Signed-off-by: Jacob Pan Signed-off-by: Liu, Yi L --- drivers/iommu/intel-iommu.c | 4 + drivers/iommu/intel-svm.c | 184 include/linux/intel-iommu.h | 8 +- include/linux/intel-svm.h | 17 4 files changed, 212 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index d2cc355..dcac964 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5691,6 +5691,10 @@ const struct iommu_ops intel_iommu_ops = { .dev_disable_feat = intel_iommu_dev_disable_feat, .is_attach_deferred = intel_iommu_is_attach_deferred, .pgsize_bitmap = INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM + .sva_bind_gpasid= intel_svm_bind_gpasid, + .sva_unbind_gpasid = intel_svm_unbind_gpasid, +#endif }; static void quirk_iommu_g4x_gfx(struct pci_dev *dev) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index ea6f2e2..c6edef2 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -222,6 +222,190 @@ static LIST_HEAD(global_svm_list); list_for_each_entry(sdev, >devs, list) \ if (dev == sdev->dev) \ +int intel_svm_bind_gpasid(struct iommu_domain *domain, + struct device *dev, + struct iommu_gpasid_bind_data *data) +{ + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); + struct dmar_domain *ddomain; + struct intel_svm_dev *sdev; + struct intel_svm *svm; + int ret = 0; + + if (WARN_ON(!iommu) || !data) + return -EINVAL; + + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) + return -EINVAL; + + if (dev_is_pci(dev)) { + /* VT-d supports devices with full 20 bit PASIDs only */ + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) + return -EINVAL; + } + + /* +* We only check host PASID range, we have no knowledge to check +* guest PASID range nor do we use the guest PASID. +*/ + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX) + return -EINVAL; + + ddomain = to_dmar_domain(domain); + /* REVISIT: +* Sanity check adddress width and paging mode support +* width matching in two dimensions: +* 1. paging mode CPU <= IOMMU +* 2. address width Guest <= Host. +*/ + mutex_lock(_mutex); + svm = ioasid_find(NULL, data->hpasid, NULL); + if (IS_ERR(svm)) { + ret = PTR_ERR(svm); + goto out; + } + if (svm) { + /* +* If we found svm for the PASID, there must be at +* least one device bond, otherwise svm should be freed. +*/ + BUG_ON(list_empty(>devs)); + + for_each_svm_dev(svm, dev) { + /* In case of multiple sub-devices of the same pdev assigned, we should +* allow multiple bind calls with the same PASID and pdev. +*/ + sdev->users++; + goto out; + } + } else { + /* We come here when PASID has never been bond to a device. */ + svm = kzalloc(sizeof(*svm), GFP_KERNEL); + if (!svm) { + ret = -ENOMEM; + goto out; + } + /* REVISIT: upper
[PATCH v5 01/19] iommu: Add a timeout parameter for PRQ response
When an I/O page request is processed outside the IOMMU subsystem, response can be delayed or lost. Add a tunable setup parameter such that user can choose the timeout for IOMMU to track pending page requests. This timeout mechanism is a basic safety net which can be implemented in conjunction with credit based or device level page response exception handling. Signed-off-by: Jacob Pan --- Documentation/admin-guide/kernel-parameters.txt | 8 ++ drivers/iommu/iommu.c | 33 + 2 files changed, 41 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 47d981a..7da5a83 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1817,6 +1817,14 @@ 1 - Bypass the IOMMU for DMA. unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH. + iommu.prq_timeout= + Timeout in seconds to wait for page response + of a pending page request. + Format: + Default: 10 + 0 - no timeout tracking + 1 to 100 - allowed range + io7=[HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in arch/alpha/kernel/core_marvel.c. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0c674d8..5b26499 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -33,6 +33,19 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; #endif static bool iommu_dma_strict __read_mostly = true; +/* + * Timeout to wait for page response of a pending page request. This is + * intended as a basic safty net in case a pending page request is not + * responded for an exceptionally long time. Device may also implement + * its own protection mechanism against this exception. + * Units are in jiffies with a range between 1 - 100 seconds equivalent. + * Default to 10 seconds. + * Setting 0 means no timeout tracking. + */ +#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100) +#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10) +static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT; + struct iommu_group { struct kobject kobj; struct kobject *devices_kobj; @@ -176,6 +189,26 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); +static int __init iommu_set_prq_timeout(char *str) +{ + int ret; + unsigned long timeout; + + if (!str) + return -EINVAL; + + ret = kstrtoul(str, 10, ); + if (ret) + return ret; + timeout = timeout * HZ; + if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT) + return -EINVAL; + prq_timeout = timeout; + + return 0; +} +early_param("iommu.prq_timeout", iommu_set_prq_timeout); + static ssize_t iommu_group_attr_show(struct kobject *kobj, struct attribute *__attr, char *buf) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 10/19] iommu/vt-d: Enlightened PASID allocation
From: Lu Baolu If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the IOMMU driver should rely on the emulation software to allocate and free PASID IDs. The Intel vt-d spec revision 3.0 defines a register set to support this. This includes a capability register, a virtual command register and a virtual response register. Refer to section 10.4.42, 10.4.43, 10.4.44 for more information. This patch adds the enlightened PASID allocation/free interfaces via the virtual command register. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Liu Yi L Signed-off-by: Lu Baolu Signed-off-by: Jacob Pan Reviewed-by: Eric Auger --- drivers/iommu/intel-pasid.c | 83 + drivers/iommu/intel-pasid.h | 13 ++- include/linux/intel-iommu.h | 2 ++ 3 files changed, 97 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 040a445..76bcbb2 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -63,6 +63,89 @@ void *intel_pasid_lookup_id(int pasid) return p; } +static int check_vcmd_pasid(struct intel_iommu *iommu) +{ + u64 cap; + + if (!ecap_vcs(iommu->ecap)) { + pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n", + iommu->name); + return -ENODEV; + } + + cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); + if (!(cap & DMA_VCS_PAS)) { + pr_warn("IOMMU: %s: Emulation software doesn't support PASID allocation\n", + iommu->name); + return -ENODEV; + } + + return 0; +} + +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid) +{ + u64 res; + u8 status_code; + unsigned long flags; + int ret = 0; + + ret = check_vcmd_pasid(iommu); + if (ret) + return ret; + + raw_spin_lock_irqsave(>register_lock, flags); + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC); + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, + !(res & VCMD_VRSP_IP), res); + raw_spin_unlock_irqrestore(>register_lock, flags); + + status_code = VCMD_VRSP_SC(res); + switch (status_code) { + case VCMD_VRSP_SC_SUCCESS: + *pasid = VCMD_VRSP_RESULT(res); + break; + case VCMD_VRSP_SC_NO_PASID_AVAIL: + pr_info("IOMMU: %s: No PASID available\n", iommu->name); + ret = -ENOMEM; + break; + default: + ret = -ENODEV; + pr_warn("IOMMU: %s: Unexpected error code %d\n", + iommu->name, status_code); + } + + return ret; +} + +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid) +{ + u64 res; + u8 status_code; + unsigned long flags; + + if (check_vcmd_pasid(iommu)) + return; + + raw_spin_lock_irqsave(>register_lock, flags); + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE); + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq, + !(res & VCMD_VRSP_IP), res); + raw_spin_unlock_irqrestore(>register_lock, flags); + + status_code = VCMD_VRSP_SC(res); + switch (status_code) { + case VCMD_VRSP_SC_SUCCESS: + break; + case VCMD_VRSP_SC_INVALID_PASID: + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name); + break; + default: + pr_warn("IOMMU: %s: Unexpected error code %d\n", + iommu->name, status_code); + } +} + /* * Per device pasid table management: */ diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index fc8cd8f..e413e88 100644 --- a/drivers/iommu/intel-pasid.h +++ b/drivers/iommu/intel-pasid.h @@ -23,6 +23,16 @@ #define is_pasid_enabled(entry)(((entry)->lo >> 3) & 0x1) #define get_pasid_dir_size(entry) (1 << entry)->lo >> 9) & 0x7) + 7)) +/* Virtual command interface for enlightened pasid management. */ +#define VCMD_CMD_ALLOC 0x1 +#define VCMD_CMD_FREE 0x2 +#define VCMD_VRSP_IP 0x1 +#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3) +#define VCMD_VRSP_SC_SUCCESS 0 +#define VCMD_VRSP_SC_NO_PASID_AVAIL1 +#define VCMD_VRSP_SC_INVALID_PASID 1 +#define VCMD_VRSP_RESULT(e)(((e) >> 8) & 0xf) + /* * Domain ID reserved for pasid entries programmed for first-level * only and pass-through transfer modes. @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu, struct device *dev, int pasid); void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, int pasid); - +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
[PATCH v5 06/19] iommu: Introduce cache_invalidate API
From: Yi L Liu In any virtualization use case, when the first translation stage is "owned" by the guest OS, the host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are trapped by the virtualizer and passed down to the host. Since the invalidation data are obtained from user space and will be written into physical IOMMU, we must allow security check at various layers. Therefore, generic invalidation data format are proposed here, model specific IOMMU drivers need to convert them into their own format. Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj Signed-off-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/iommu.c | 10 + include/linux/iommu.h | 14 ++ include/uapi/linux/iommu.h | 110 + 3 files changed, 134 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 155ebef..6228d5d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1719,6 +1719,16 @@ void iommu_detach_pasid_table(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_detach_pasid_table); +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev, + struct iommu_cache_invalidate_info *inv_info) +{ + if (unlikely(!domain->ops->cache_invalidate)) + return -ENODEV; + + return domain->ops->cache_invalidate(domain, dev, inv_info); +} +EXPORT_SYMBOL_GPL(iommu_cache_invalidate); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 8c64065..28f1a8c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -230,6 +230,7 @@ struct iommu_sva_ops { * @page_response: handle page request response * @attach_pasid_table: attach a pasid table * @detach_pasid_table: detach the pasid table + * @cache_invalidate: invalidate translation caches * @pgsize_bitmap: bitmap of all possible supported page sizes */ struct iommu_ops { @@ -296,6 +297,8 @@ struct iommu_ops { int (*page_response)(struct device *dev, struct iommu_fault_event *evt, struct iommu_page_response *msg); + int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev, + struct iommu_cache_invalidate_info *inv_info); unsigned long pgsize_bitmap; }; @@ -407,6 +410,9 @@ extern void iommu_detach_device(struct iommu_domain *domain, extern int iommu_attach_pasid_table(struct iommu_domain *domain, struct iommu_pasid_table_config *cfg); extern void iommu_detach_pasid_table(struct iommu_domain *domain); +extern int iommu_cache_invalidate(struct iommu_domain *domain, + struct device *dev, + struct iommu_cache_invalidate_info *inv_info); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -959,6 +965,14 @@ int iommu_attach_pasid_table(struct iommu_domain *domain, static inline void iommu_detach_pasid_table(struct iommu_domain *domain) {} +static inline int +iommu_cache_invalidate(struct iommu_domain *domain, + struct device *dev, + struct iommu_cache_invalidate_info *inv_info) +{ + return -ENODEV; +} + #endif /* CONFIG_IOMMU_API */ #ifdef CONFIG_IOMMU_DEBUGFS diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 0f9d249..919ea02 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -203,4 +203,114 @@ struct iommu_pasid_table_config { }; }; +/* defines the granularity of the invalidation */ +enum iommu_inv_granularity { + IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */ + IOMMU_INV_GRANU_PASID, /* PASID-selective invalidation */ + IOMMU_INV_GRANU_ADDR, /* page-selective invalidation */ + IOMMU_INV_GRANU_NR, /* number of invalidation granularities */ +}; + +/** + * struct iommu_inv_addr_info - Address Selective Invalidation Structure + * + * @flags: indicates the granularity of the address-selective invalidation + * - If the PASID bit is set, the @pasid field is populated and the invalidation + * relates to cache entries tagged with this PASID and matching the address + * range. + * - If ARCHID bit is set, @archid is populated and the invalidation relates + * to cache entries tagged with this architecture specific ID and matching + * the address range. + * - Both PASID and ARCHID can be set as they may tag different caches. + * - If neither PASID or ARCHID is set, global addr invalidation applies. + * - The LEAF flag indicates whether
[PATCH v5 15/19] iommu/vt-d: Add nested translation helper function
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8. With PASID granular translation type set to 0x11b, translation result from the first level(FL) also subject to a second level(SL) page table translation. This mode is used for SVA virtualization, where FL performs guest virtual to guest physical translation and SL performs guest physical to host physical translation. Signed-off-by: Jacob Pan Signed-off-by: Liu, Yi L --- drivers/iommu/intel-pasid.c | 207 drivers/iommu/intel-pasid.h | 12 +++ 2 files changed, 219 insertions(+) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 9c5affc..fd2c82f 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -442,6 +442,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value) pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2); } +/* + * Setup the Extended Memory Type(EMT) field (Bits 91-93) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_emt(struct pasid_entry *pe, u64 value) +{ + pasid_set_bits(>val[1], GENMASK_ULL(29, 27), value << 27); +} + +/* + * Setup the Page Attribute Table (PAT) field (Bits 96-127) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_pat(struct pasid_entry *pe, u64 value) +{ + pasid_set_bits(>val[1], GENMASK_ULL(63, 32), value << 27); +} + +/* + * Setup the Cache Disable (CD) field (Bit 89) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_cd(struct pasid_entry *pe) +{ + pasid_set_bits(>val[1], 1 << 25, 1); +} + +/* + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_emte(struct pasid_entry *pe) +{ + pasid_set_bits(>val[1], 1 << 26, 1); +} + +/* + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_eafe(struct pasid_entry *pe) +{ + pasid_set_bits(>val[2], 1 << 7, 1); +} + +/* + * Setup the Page-level Cache Disable (PCD) field (Bit 95) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_pcd(struct pasid_entry *pe) +{ + pasid_set_bits(>val[1], 1 << 31, 1); +} + +/* + * Setup the Page-level Write-Through (PWT)) field (Bit 94) + * of a scalable mode PASID entry. + */ +static inline void +pasid_set_pwt(struct pasid_entry *pe) +{ + pasid_set_bits(>val[1], 1 << 30, 1); +} + static void pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, int pasid) @@ -674,3 +744,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu, return 0; } + +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu, + struct pasid_entry *pte, + struct iommu_gpasid_bind_data_vtd *pasid_data) +{ + /* +* Not all guest PASID table entry fields are passed down during bind, +* here we only set up the ones that are dependent on guest settings. +* Execution related bits such as NXE, SMEP are not meaningful to IOMMU, +* therefore not set. Other fields, such as snoop related, are set based +* on host needs regardless of guest settings. +*/ + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) { + if (!ecap_srs(iommu->ecap)) { + pr_err("No supervisor request support on %s\n", + iommu->name); + return -EINVAL; + } + pasid_set_sre(pte); + } + + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && ecap_eafs(iommu->ecap)) + pasid_set_eafe(pte); + + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) { + pasid_set_emte(pte); + pasid_set_emt(pte, pasid_data->emt); + } + + /* +* Memory type is only applicable to devices inside processor coherent +* domain. PCIe devices are not included. We can skip the rest of the +* flags if IOMMU does not support MTS. +*/ + if (!ecap_mts(iommu->ecap)) { + pr_info("%s does not support memory type bind guest PASID\n", + iommu->name); + return 0; + } + + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD) + pasid_set_pcd(pte); + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT) + pasid_set_pwt(pte); + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD) + pasid_set_cd(pte); + pasid_set_pat(pte, pasid_data->pat); + + return 0; + +} + +/** + * intel_pasid_setup_nested() - Set up PASID entry for nested translation + * which is used for vSVA. The first level page tables are used for + * GVA-GPA translation in the guest, second level page tables are used + * for GPA to HPA translation. + * + * @iommu:
[PATCH v5 13/19] iommu/vt-d: Move domain helper to header
Move domain helper to header to be used by SVA code. Signed-off-by: Jacob Pan Reviewed-by: Eric Auger --- drivers/iommu/intel-iommu.c | 6 -- include/linux/intel-iommu.h | 6 ++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 96defc3..d2cc355 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -418,12 +418,6 @@ static void init_translation_status(struct intel_iommu *iommu) iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED; } -/* Convert generic 'struct iommu_domain to private struct dmar_domain */ -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) -{ - return container_of(dom, struct dmar_domain, domain); -} - static int __init intel_iommu_setup(char *str) { if (!str) diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 80318c5..e1865f1 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -591,6 +591,12 @@ static inline void __iommu_flush_cache( clflush_cache_range(addr, size); } +/* Convert generic struct iommu_domain to private struct dmar_domain */ +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct dmar_domain, domain); +} + /* * 0: readable * 1: writable -- 2.7.4
[PATCH v5 00/19] Shared virtual address IOMMU and VT-d support
Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel platforms allow address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. This series is intended to enable SVA virtualization, i.e. shared guest application address space and physical device DMA address. Only IOMMU portion of the changes are included in this series. Additional support is needed in VFIO and QEMU (will be submitted separately) to complete this functionality. To make incremental changes and reduce the size of each patchset. This series does not inlcude support for page request services. In VT-d implementation, PASID table is per device and maintained in the host. Guest PASID table is shadowed in VMM where virtual IOMMU is emulated. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | | |SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables This work is based on collaboration with other developers on the IOMMU mailing list. Notably, [1] Common APIs git://linux-arm.org/linux-jpb.git sva/api [2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe Brucker https://www.spinics.net/lists/iommu/msg30639.html [3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by Lu Baolu https://lkml.org/lkml/2018/11/12/1921 [4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) https://lwn.net/Articles/754331/ There are roughly three parts: 1. Generic PASID allocator [1] with extension to support custom allocator 2. IOMMU cache invalidation passdown from guest to host 3. Guest PASID bind for nested translation All generic IOMMU APIs are reused from [1] with minor tweaks. With this patchset, guest SVA without page request works on VT-d. PRS patches will come next as we try to avoid large patchset that is hard to review. The patches for basic SVA support (w/o PRS) starts: [PATCH v5 05/19] iommu: Introduce attach/detach_pasid_table API It is worth noting that unlike sMMU nested stage setup, where PASID table is owned by the guest, VT-d PASID table is owned by the host, individual PASIDs are bound instead of the PASID table. This series is based on the new VT-d 3.0 Specification (https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf). This is different than the older series in [4] which was based on the older specification that does not have scalable mode. ChangeLog: - V5 Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged. Addressed v4 review comments from Eric Auger, Baolu Lu, and Jonathan Cameron. Specific changes are as follows: - Refined custom IOASID allocator to support multiple vIOMMU, hotplug cases. - Extracted vendor data from IOMMU guest PASID bind data, for VT-d will support all necessary guest PASID entry fields for PASID bind. - Support non-identity host-guest PASID mapping - Exception handling in various cases - V4 - Redesigned IOASID allocator such that it can support custom allocators with shared helper functions. Use separate XArray to store IOASIDs per allocator. Took advice from Eric Auger to have default allocator use the generic allocator structure. Combined into one patch in that the default allocator is just "another" allocator now. Can be built as a module in case of driver use without IOMMU. - Extended bind guest PASID data to support SMMU and non-identity guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802 - Rebased on Jean's sva/api common tree, new patches starts with [PATCH v4 10/22] - V3 - Addressed thorough review comments from Eric Auger (Thank you!) - Moved IOASID allocator from driver core to IOMMU code per suggestion by Christoph Hellwig (https://lkml.org/lkml/2019/4/26/462) - Rebased on top of Jean's SVA API branch and Eric's v7[1] (git://linux-arm.org/linux-jpb.git
[PATCH v5 05/19] iommu: Introduce attach/detach_pasid_table API
In virtualization use case, when a guest is assigned a PCI host device, protected by a virtual IOMMU on the guest, the physical IOMMU must be programmed to be consistent with the guest mappings. If the physical IOMMU supports two translation stages it makes sense to program guest mappings onto the first stage/level (ARM/Intel terminology) while the host owns the stage/level 2. In that case, it is mandated to trap on guest configuration settings and pass those to the physical iommu driver. This patch adds a new API to the iommu subsystem that allows to set/unset the pasid table information. A generic iommu_pasid_table_config struct is introduced in a new iommu.h uapi header. This is going to be used by the VFIO user API. Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu, Yi L Signed-off-by: Ashok Raj Signed-off-by: Jacob Pan Signed-off-by: Eric Auger Reviewed-by: Jean-Philippe Brucker --- drivers/iommu/iommu.c | 19 + include/linux/iommu.h | 18 include/uapi/linux/iommu.h | 51 ++ 3 files changed, 88 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index feada31..155ebef 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1700,6 +1700,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_attach_device); +int iommu_attach_pasid_table(struct iommu_domain *domain, +struct iommu_pasid_table_config *cfg) +{ + if (unlikely(!domain->ops->attach_pasid_table)) + return -ENODEV; + + return domain->ops->attach_pasid_table(domain, cfg); +} +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table); + +void iommu_detach_pasid_table(struct iommu_domain *domain) +{ + if (unlikely(!domain->ops->detach_pasid_table)) + return; + + domain->ops->detach_pasid_table(domain); +} +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 39d371b..8c64065 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -228,6 +228,8 @@ struct iommu_sva_ops { * @sva_unbind: Unbind process address space from device * @sva_get_pasid: Get PASID associated to a SVA handle * @page_response: handle page request response + * @attach_pasid_table: attach a pasid table + * @detach_pasid_table: detach the pasid table * @pgsize_bitmap: bitmap of all possible supported page sizes */ struct iommu_ops { @@ -287,6 +289,9 @@ struct iommu_ops { void *drvdata); void (*sva_unbind)(struct iommu_sva *handle); int (*sva_get_pasid)(struct iommu_sva *handle); + int (*attach_pasid_table)(struct iommu_domain *domain, + struct iommu_pasid_table_config *cfg); + void (*detach_pasid_table)(struct iommu_domain *domain); int (*page_response)(struct device *dev, struct iommu_fault_event *evt, @@ -399,6 +404,9 @@ extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); +extern int iommu_attach_pasid_table(struct iommu_domain *domain, + struct iommu_pasid_table_config *cfg); +extern void iommu_detach_pasid_table(struct iommu_domain *domain); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -941,6 +949,16 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle) return IOMMU_PASID_INVALID; } +static inline +int iommu_attach_pasid_table(struct iommu_domain *domain, +struct iommu_pasid_table_config *cfg) +{ + return -ENODEV; +} + +static inline +void iommu_detach_pasid_table(struct iommu_domain *domain) {} + #endif /* CONFIG_IOMMU_API */ #ifdef CONFIG_IOMMU_DEBUGFS diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index fc00c5d..0f9d249 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -152,4 +152,55 @@ struct iommu_page_response { __u32 code; }; +/** + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related + * information + * @version: API version of this structure + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table + * or 2-level table) + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0 + * and no PASID is passed along with the incoming transaction) + * @padding: reserved for future use (should be zero) + * + * The PASID table is
[PATCH v5 04/19] iommu: Use device fault trace event
From: Jean-Philippe Brucker For performance and debugging purposes, these trace events help analyzing device faults that interact with IOMMU subsystem. E.g. IOMMU::00:0a.0 type=2 reason=0 addr=0x007ff000 pasid=1 group=1 last=0 prot=1 Signed-off-by: Jacob Pan [JPB: removed invalidate event, that will be added later] Signed-off-by: Jean-Philippe Brucker Signed-off-by: Jacob Pan --- drivers/iommu/iommu.c| 2 ++ include/trace/events/iommu.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8f2c7d5..feada31 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1098,6 +1098,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt) mutex_unlock(>lock); kfree(evt_pending); } + trace_dev_fault(dev, >fault); done_unlock: mutex_unlock(>lock); return ret; @@ -1146,6 +1147,7 @@ int iommu_page_response(struct device *dev, msg->flags = pasid_valid ? IOMMU_PAGE_RESP_PASID_VALID : 0; ret = domain->ops->page_response(dev, evt, msg); + trace_dev_page_response(dev, msg); list_del(>list); kfree(evt); break; diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h index 767b92c..7a7801b 100644 --- a/include/trace/events/iommu.h +++ b/include/trace/events/iommu.h @@ -219,7 +219,7 @@ TRACE_EVENT(dev_fault, TRACE_EVENT(dev_page_response, - TP_PROTO(struct device *dev, struct iommu_fault_page_response *msg), + TP_PROTO(struct device *dev, struct iommu_page_response *msg), TP_ARGS(dev, msg), -- 2.7.4
Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn
On Thu, Aug 15, 2019 at 10:28:21AM +0200, Christian König wrote: > Am 07.08.19 um 01:15 schrieb Jason Gunthorpe: > > From: Jason Gunthorpe > > > > radeon is using a device global hash table to track what mmu_notifiers > > have been registered on struct mm. This is better served with the new > > get/put scheme instead. > > > > radeon has a bug where it was not blocking notifier release() until all > > the BO's had been invalidated. This could result in a use after free of > > pages the BOs. This is tied into a second bug where radeon left the > > notifiers running endlessly even once the interval tree became > > empty. This could result in a use after free with module unload. > > > > Both are fixed by changing the lifetime model, the BOs exist in the > > interval tree with their natural lifetimes independent of the mm_struct > > lifetime using the get/put scheme. The release runs synchronously and just > > does invalidate_start across the entire interval tree to create the > > required DMA fence. > > > > Additions to the interval tree after release are already impossible as > > only current->mm is used during the add. > > > > Signed-off-by: Jason Gunthorpe > > Acked-by: Christian König Thanks! > But I'm wondering if we shouldn't completely drop radeon userptr support. > It's just to buggy, I would not object :) Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 11/17] iommu/arm-smmu: Abstract GR0 accesses
Clean up the remaining accesses to GR0 registers, so that everything is now neatly abstracted. This folds up the Non-Secure alias quirk as the first step towards moving it out of the way entirely. Although GR0 does technically contain some 64-bit registers (sGFAR and the weird SMMUv2 HYPC and MONC stuff), they're not ones we have any need to access. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 106 +-- 1 file changed, 58 insertions(+), 48 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e72554f334ee..e9fd9117109e 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -69,19 +69,6 @@ /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 -/* SMMU global address space */ -#define ARM_SMMU_GR0(smmu) ((smmu)->base) - -/* - * SMMU global address space with conditional offset to access secure - * aliases of non-secure registers (e.g. nsCR0: 0x400, nsGFSR: 0x448, - * nsGFSYNR0: 0x450) - */ -#define ARM_SMMU_GR0_NS(smmu) \ - ((smmu)->base + \ - ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) \ - ? 0x400 : 0)) - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -246,6 +233,21 @@ struct arm_smmu_domain { struct iommu_domain domain; }; +static int arm_smmu_gr0_ns(int offset) +{ + switch(offset) { + case ARM_SMMU_GR0_sCR0: + case ARM_SMMU_GR0_sACR: + case ARM_SMMU_GR0_sGFSR: + case ARM_SMMU_GR0_sGFSYNR0: + case ARM_SMMU_GR0_sGFSYNR1: + case ARM_SMMU_GR0_sGFSYNR2: + return offset + 0x400; + default: + return offset; + } +} + static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n) { return smmu->base + (n << smmu->pgshift); @@ -253,12 +255,18 @@ static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n) static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset) { + if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0) + offset = arm_smmu_gr0_ns(offset); + return readl_relaxed(arm_smmu_page(smmu, page) + offset); } static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset, u32 val) { + if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0) + offset = arm_smmu_gr0_ns(offset); + writel_relaxed(val, arm_smmu_page(smmu, page) + offset); } @@ -273,9 +281,15 @@ static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset, writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); } +#define ARM_SMMU_GR0 0 #define ARM_SMMU_GR1 1 #define ARM_SMMU_CB(s, n) ((s)->numpage + (n)) +#define arm_smmu_gr0_read(s, o)\ + arm_smmu_readl((s), ARM_SMMU_GR0, (o)) +#define arm_smmu_gr0_write(s, o, v)\ + arm_smmu_writel((s), ARM_SMMU_GR0, (o), (v)) + #define arm_smmu_gr1_read(s, o)\ arm_smmu_readl((s), ARM_SMMU_GR1, (o)) #define arm_smmu_gr1_write(s, o, v)\ @@ -470,7 +484,7 @@ static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu) unsigned long flags; spin_lock_irqsave(>global_sync_lock, flags); - __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC, + __arm_smmu_tlb_sync(smmu, ARM_SMMU_GR0, ARM_SMMU_GR0_sTLBGSYNC, ARM_SMMU_GR0_sTLBGSTATUS); spin_unlock_irqrestore(>global_sync_lock, flags); } @@ -511,10 +525,10 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; struct arm_smmu_device *smmu = smmu_domain->smmu; - void __iomem *base = ARM_SMMU_GR0(smmu); - /* NOTE: see above */ - writel(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID); + /* See above */ + wmb(); + arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid); arm_smmu_tlb_sync_global(smmu); } @@ -579,12 +593,12 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size, size_t granule, bool leaf, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; - void __iomem *base = ARM_SMMU_GR0(smmu_domain->smmu); + struct arm_smmu_device *smmu = smmu_domain->smmu; - if (smmu_domain->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) + if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) wmb(); - writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID); + arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_TLBIVMID, smmu_domain->cfg.vmid); } static const struct iommu_gather_ops
[PATCH v2 17/17] iommu/arm-smmu: Add context init implementation hook
Allocating and initialising a context for a domain is another point where certain implementations are known to want special behaviour. Currently the other half of the Cavium workaround comes into play here, so let's finish the job to get the whole thing right out of the way. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-impl.c | 42 ++--- drivers/iommu/arm-smmu.c | 51 +++ drivers/iommu/arm-smmu.h | 42 +++-- 3 files changed, 87 insertions(+), 48 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 4dc8b1c4befb..e22e9004f449 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -48,25 +48,60 @@ const struct arm_smmu_impl calxeda_impl = { }; +struct cavium_smmu { + struct arm_smmu_device smmu; + u32 id_base; +}; + static int cavium_cfg_probe(struct arm_smmu_device *smmu) { static atomic_t context_count = ATOMIC_INIT(0); + struct cavium_smmu *cs = container_of(smmu, struct cavium_smmu, smmu); /* * Cavium CN88xx erratum #27704. * Ensure ASID and VMID allocation is unique across all SMMUs in * the system. */ - smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks, - _count); + cs->id_base = atomic_fetch_add(smmu->num_context_banks, _count); dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); return 0; } +int cavium_init_context(struct arm_smmu_domain *smmu_domain) +{ + struct cavium_smmu *cs = container_of(smmu_domain->smmu, + struct cavium_smmu, smmu); + + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) + smmu_domain->cfg.vmid += cs->id_base; + else + smmu_domain->cfg.asid += cs->id_base; + + return 0; +} + const struct arm_smmu_impl cavium_impl = { .cfg_probe = cavium_cfg_probe, + .init_context = cavium_init_context, }; +struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) +{ + struct cavium_smmu *cs; + + cs = devm_kzalloc(smmu->dev, sizeof(*cs), GFP_KERNEL); + if (!cs) + return ERR_PTR(-ENOMEM); + + cs->smmu = *smmu; + cs->smmu.impl = _impl; + + devm_kfree(smmu->dev, smmu); + + return >smmu; +} + #define ARM_MMU500_ACTLR_CPRE (1 << 1) @@ -126,8 +161,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) smmu->impl = _mmu500_impl; break; case CAVIUM_SMMUV2: - smmu->impl = _impl; - break; + return cavium_smmu_impl_init(smmu); default: break; } diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index fc98992d120d..b8628e2ab579 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include @@ -111,44 +110,6 @@ struct arm_smmu_master_cfg { #define for_each_cfg_sme(fw, i, idx) \ for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i) -enum arm_smmu_context_fmt { - ARM_SMMU_CTX_FMT_NONE, - ARM_SMMU_CTX_FMT_AARCH64, - ARM_SMMU_CTX_FMT_AARCH32_L, - ARM_SMMU_CTX_FMT_AARCH32_S, -}; - -struct arm_smmu_cfg { - u8 cbndx; - u8 irptndx; - union { - u16 asid; - u16 vmid; - }; - enum arm_smmu_cbar_type cbar; - enum arm_smmu_context_fmt fmt; -}; -#define INVALID_IRPTNDX0xff - -enum arm_smmu_domain_stage { - ARM_SMMU_DOMAIN_S1 = 0, - ARM_SMMU_DOMAIN_S2, - ARM_SMMU_DOMAIN_NESTED, - ARM_SMMU_DOMAIN_BYPASS, -}; - -struct arm_smmu_domain { - struct arm_smmu_device *smmu; - struct io_pgtable_ops *pgtbl_ops; - const struct iommu_gather_ops *tlb_ops; - struct arm_smmu_cfg cfg; - enum arm_smmu_domain_stage stage; - boolnon_strict; - struct mutexinit_mutex; /* Protects smmu pointer */ - spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ - struct iommu_domain domain; -}; - static bool using_legacy_binding, using_generic_binding; static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) @@ -749,9 +710,16 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, } if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) - cfg->vmid = cfg->cbndx + 1 + smmu->cavium_id_base; + cfg->vmid = cfg->cbndx + 1; else -
[PATCH v2 14/17] iommu/arm-smmu: Move Secure access quirk to implementation
Move detection of the Secure access quirk to its new home, trimming it down in the process - time has proven that boolean DT flags are neither ideal nor necessarily sufficient, so it's highly unlikely we'll ever add more, let alone enough to justify the frankly overengineered parsing machinery. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-impl.c | 44 drivers/iommu/arm-smmu.c | 97 --- drivers/iommu/arm-smmu.h | 72 +- 3 files changed, 114 insertions(+), 99 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index efeb6d78da17..0657c85580cb 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -4,10 +4,54 @@ #define pr_fmt(fmt) "arm-smmu: " fmt +#include + #include "arm-smmu.h" +static int arm_smmu_gr0_ns(int offset) +{ + switch(offset) { + case ARM_SMMU_GR0_sCR0: + case ARM_SMMU_GR0_sACR: + case ARM_SMMU_GR0_sGFSR: + case ARM_SMMU_GR0_sGFSYNR0: + case ARM_SMMU_GR0_sGFSYNR1: + case ARM_SMMU_GR0_sGFSYNR2: + return offset + 0x400; + default: + return offset; + } +} + +static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page, + int offset) +{ + if (page == ARM_SMMU_GR0) + offset = arm_smmu_gr0_ns(offset); + return readl_relaxed(arm_smmu_page(smmu, page) + offset); +} + +static void arm_smmu_write_ns(struct arm_smmu_device *smmu, int page, + int offset, u32 val) +{ + if (page == ARM_SMMU_GR0) + offset = arm_smmu_gr0_ns(offset); + writel_relaxed(val, arm_smmu_page(smmu, page) + offset); +} + +/* Since we don't care for sGFAR, we can do without 64-bit accessors */ +const struct arm_smmu_impl calxeda_impl = { + .read_reg = arm_smmu_read_ns, + .write_reg = arm_smmu_write_ns, +}; + + struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { + if (of_property_read_bool(smmu->dev->of_node, + "calxeda,smmu-secure-config-access")) + smmu->impl = _impl; + return smmu; } diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 1e8153182830..432d781f05f3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -155,91 +155,10 @@ struct arm_smmu_domain { struct iommu_domain domain; }; -static int arm_smmu_gr0_ns(int offset) -{ - switch(offset) { - case ARM_SMMU_GR0_sCR0: - case ARM_SMMU_GR0_sACR: - case ARM_SMMU_GR0_sGFSR: - case ARM_SMMU_GR0_sGFSYNR0: - case ARM_SMMU_GR0_sGFSYNR1: - case ARM_SMMU_GR0_sGFSYNR2: - return offset + 0x400; - default: - return offset; - } -} - -static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n) -{ - return smmu->base + (n << smmu->pgshift); -} - -static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset) -{ - if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0) - offset = arm_smmu_gr0_ns(offset); - - return readl_relaxed(arm_smmu_page(smmu, page) + offset); -} - -static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset, - u32 val) -{ - if ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) && page == 0) - offset = arm_smmu_gr0_ns(offset); - - writel_relaxed(val, arm_smmu_page(smmu, page) + offset); -} - -static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset) -{ - return readq_relaxed(arm_smmu_page(smmu, page) + offset); -} - -static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset, - u64 val) -{ - writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); -} - -#define ARM_SMMU_GR0 0 -#define ARM_SMMU_GR1 1 -#define ARM_SMMU_CB(s, n) ((s)->numpage + (n)) - -#define arm_smmu_gr0_read(s, o)\ - arm_smmu_readl((s), ARM_SMMU_GR0, (o)) -#define arm_smmu_gr0_write(s, o, v)\ - arm_smmu_writel((s), ARM_SMMU_GR0, (o), (v)) - -#define arm_smmu_gr1_read(s, o)\ - arm_smmu_readl((s), ARM_SMMU_GR1, (o)) -#define arm_smmu_gr1_write(s, o, v)\ - arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v)) - -#define arm_smmu_cb_read(s, n, o) \ - arm_smmu_readl((s), ARM_SMMU_CB((s), (n)), (o)) -#define arm_smmu_cb_write(s, n, o, v) \ - arm_smmu_writel((s), ARM_SMMU_CB((s), (n)), (o), (v)) -#define arm_smmu_cb_readq(s, n, o) \ - arm_smmu_readq((s), ARM_SMMU_CB((s), (n)), (o)) -#define arm_smmu_cb_writeq(s, n, o, v) \ - arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v)) - -struct arm_smmu_option_prop { - u32 opt; - const char *prop; -}; - static atomic_t
[PATCH v2 15/17] iommu/arm-smmu: Add configuration implementation hook
Probing the ID registers and setting up the SMMU configuration is an area where overrides and workarounds may well be needed. Indeed, the Cavium workaround detection lives there at the moment, so let's break that out. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-impl.c | 34 ++ drivers/iommu/arm-smmu.c | 17 +++-- drivers/iommu/arm-smmu.h | 1 + 3 files changed, 38 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 0657c85580cb..696417908793 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -47,8 +47,42 @@ const struct arm_smmu_impl calxeda_impl = { }; +static int cavium_cfg_probe(struct arm_smmu_device *smmu) +{ + static atomic_t context_count = ATOMIC_INIT(0); + /* +* Cavium CN88xx erratum #27704. +* Ensure ASID and VMID allocation is unique across all SMMUs in +* the system. +*/ + smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks, + _count); + dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); + + return 0; +} + +const struct arm_smmu_impl cavium_impl = { + .cfg_probe = cavium_cfg_probe, +}; + + struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { + /* +* We will inevitably have to combine model-specific implementation +* quirks with platform-specific integration quirks, but everything +* we currently support happens to work out as straightforward +* mutually-exclusive assignments. +*/ + switch (smmu->model) { + case CAVIUM_SMMUV2: + smmu->impl = _impl; + break; + default: + break; + } + if (of_property_read_bool(smmu->dev->of_node, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 432d781f05f3..362b6b5a28ee 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -155,8 +155,6 @@ struct arm_smmu_domain { struct iommu_domain domain; }; -static atomic_t cavium_smmu_context_count = ATOMIC_INIT(0); - static bool using_legacy_binding, using_generic_binding; static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) @@ -1804,18 +1802,6 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) } dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n", smmu->num_context_banks, smmu->num_s2_context_banks); - /* -* Cavium CN88xx erratum #27704. -* Ensure ASID and VMID allocation is unique across all SMMUs in -* the system. -*/ - if (smmu->model == CAVIUM_SMMUV2) { - smmu->cavium_id_base = - atomic_add_return(smmu->num_context_banks, - _smmu_context_count); - smmu->cavium_id_base -= smmu->num_context_banks; - dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); - } smmu->cbs = devm_kcalloc(smmu->dev, smmu->num_context_banks, sizeof(*smmu->cbs), GFP_KERNEL); if (!smmu->cbs) @@ -1884,6 +1870,9 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) dev_notice(smmu->dev, "\tStage-2: %lu-bit IPA -> %lu-bit PA\n", smmu->ipa_size, smmu->pa_size); + if (smmu->impl && smmu->impl->cfg_probe) + return smmu->impl->cfg_probe(smmu); + return 0; } diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d4fd29d70705..f4e90f33fce2 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -287,6 +287,7 @@ struct arm_smmu_impl { u64 (*read_reg64)(struct arm_smmu_device *smmu, int page, int offset); void (*write_reg64)(struct arm_smmu_device *smmu, int page, int offset, u64 val); + int (*cfg_probe)(struct arm_smmu_device *smmu); }; static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n) -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 16/17] iommu/arm-smmu: Add reset implementation hook
Reset is an activity rife with implementation-defined poking. Add a corresponding hook, and use it to encapsulate the existing MMU-500 details. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-impl.c | 49 +++ drivers/iommu/arm-smmu.c | 39 +++- drivers/iommu/arm-smmu.h | 1 + 3 files changed, 54 insertions(+), 35 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 696417908793..4dc8b1c4befb 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -4,6 +4,7 @@ #define pr_fmt(fmt) "arm-smmu: " fmt +#include #include #include "arm-smmu.h" @@ -67,6 +68,51 @@ const struct arm_smmu_impl cavium_impl = { }; +#define ARM_MMU500_ACTLR_CPRE (1 << 1) + +#define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) +#define ARM_MMU500_ACR_S2CRB_TLBEN (1 << 10) +#define ARM_MMU500_ACR_SMTNMB_TLBEN(1 << 8) + +static int arm_mmu500_reset(struct arm_smmu_device *smmu) +{ + u32 reg, major; + int i; + /* +* On MMU-500 r2p0 onwards we need to clear ACR.CACHE_LOCK before +* writes to the context bank ACTLRs will stick. And we just hope that +* Secure has also cleared SACR.CACHE_LOCK for this to take effect... +*/ + reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7); + major = FIELD_GET(ID7_MAJOR, reg); + reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR); + if (major >= 2) + reg &= ~ARM_MMU500_ACR_CACHE_LOCK; + /* +* Allow unmatched Stream IDs to allocate bypass +* TLB entries for reduced latency. +*/ + reg |= ARM_MMU500_ACR_SMTNMB_TLBEN | ARM_MMU500_ACR_S2CRB_TLBEN; + arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sACR, reg); + + /* +* Disable MMU-500's not-particularly-beneficial next-page +* prefetcher for the sake of errata #841119 and #826419. +*/ + for (i = 0; i < smmu->num_context_banks; ++i) { + reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); + reg &= ~ARM_MMU500_ACTLR_CPRE; + arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); + } + + return 0; +} + +const struct arm_smmu_impl arm_mmu500_impl = { + .reset = arm_mmu500_reset, +}; + + struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { /* @@ -76,6 +122,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) * mutually-exclusive assignments. */ switch (smmu->model) { + case ARM_MMU500: + smmu->impl = _mmu500_impl; + break; case CAVIUM_SMMUV2: smmu->impl = _impl; break; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 362b6b5a28ee..fc98992d120d 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -54,12 +54,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define ARM_MMU500_ACTLR_CPRE (1 << 1) - -#define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) -#define ARM_MMU500_ACR_S2CRB_TLBEN (1 << 10) -#define ARM_MMU500_ACR_SMTNMB_TLBEN(1 << 8) - #define TLB_LOOP_TIMEOUT 100 /* 1s! */ #define TLB_SPIN_COUNT 10 @@ -1574,7 +1568,7 @@ static struct iommu_ops arm_smmu_ops = { static void arm_smmu_device_reset(struct arm_smmu_device *smmu) { int i; - u32 reg, major; + u32 reg; /* clear global FSR */ reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR); @@ -1587,38 +1581,10 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu) for (i = 0; i < smmu->num_mapping_groups; ++i) arm_smmu_write_sme(smmu, i); - if (smmu->model == ARM_MMU500) { - /* -* Before clearing ARM_MMU500_ACTLR_CPRE, need to -* clear CACHE_LOCK bit of ACR first. And, CACHE_LOCK -* bit is only present in MMU-500r2 onwards. -*/ - reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7); - major = FIELD_GET(ID7_MAJOR, reg); - reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR); - if (major >= 2) - reg &= ~ARM_MMU500_ACR_CACHE_LOCK; - /* -* Allow unmatched Stream IDs to allocate bypass -* TLB entries for reduced latency. -*/ - reg |= ARM_MMU500_ACR_SMTNMB_TLBEN | ARM_MMU500_ACR_S2CRB_TLBEN; - arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sACR, reg); - } - /* Make sure all context banks are disabled and clear CB_FSR */ for (i = 0; i < smmu->num_context_banks; ++i) { arm_smmu_write_context_bank(smmu, i); arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_FSR, FSR_FAULT); - /* -* Disable MMU-500's not-particularly-beneficial next-page -
[PATCH v2 12/17] iommu/arm-smmu: Rename arm-smmu-regs.h
We're about to start using it for more than just register definitions, so generalise the name. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/{arm-smmu-regs.h => arm-smmu.h} | 6 +++--- drivers/iommu/qcom_iommu.c| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) rename drivers/iommu/{arm-smmu-regs.h => arm-smmu.h} (98%) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e9fd9117109e..f3b8301a3059 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -46,7 +46,7 @@ #include #include -#include "arm-smmu-regs.h" +#include "arm-smmu.h" /* * Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu.h similarity index 98% rename from drivers/iommu/arm-smmu-regs.h rename to drivers/iommu/arm-smmu.h index a8e288192285..ccc3097a4247 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu.h @@ -7,8 +7,8 @@ * Author: Will Deacon */ -#ifndef _ARM_SMMU_REGS_H -#define _ARM_SMMU_REGS_H +#ifndef _ARM_SMMU_H +#define _ARM_SMMU_H #include @@ -194,4 +194,4 @@ enum arm_smmu_cbar_type { #define ARM_SMMU_CB_ATSR 0x8f0 #define ATSR_ACTIVEBIT(0) -#endif /* _ARM_SMMU_REGS_H */ +#endif /* _ARM_SMMU_H */ diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 60a125dd7300..a2062d13584f 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -33,7 +33,7 @@ #include #include -#include "arm-smmu-regs.h" +#include "arm-smmu.h" #define SMMU_INTR_SEL_NS 0x2000 -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 07/17] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()
Since we now use separate iommu_gather_ops for stage 1 and stage 2 contexts, we may as well divide up the monolithic callback into its respective stage 1 and stage 2 parts. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 66 ++-- 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 19126230c780..5b12e96d7878 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -490,46 +490,54 @@ static void arm_smmu_tlb_inv_context_s2(void *cookie) arm_smmu_tlb_sync_global(smmu); } -static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, - size_t granule, bool leaf, void *cookie) +static void arm_smmu_tlb_inv_range_s1(unsigned long iova, size_t size, + size_t granule, bool leaf, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; + struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = _domain->cfg; - bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS; - void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx); + void __iomem *reg = ARM_SMMU_CB(smmu, cfg->cbndx); - if (smmu_domain->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) + if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) wmb(); - if (stage1) { - reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA; + reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA; - if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { - iova = (iova >> 12) << 12; - iova |= cfg->asid; - do { - writel_relaxed(iova, reg); - iova += granule; - } while (size -= granule); - } else { - iova >>= 12; - iova |= (u64)cfg->asid << 48; - do { - writeq_relaxed(iova, reg); - iova += granule >> 12; - } while (size -= granule); - } - } else { - reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : - ARM_SMMU_CB_S2_TLBIIPAS2; - iova >>= 12; + if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { + iova = (iova >> 12) << 12; + iova |= cfg->asid; do { - smmu_write_atomic_lq(iova, reg); + writel_relaxed(iova, reg); + iova += granule; + } while (size -= granule); + } else { + iova >>= 12; + iova |= (u64)cfg->asid << 48; + do { + writeq_relaxed(iova, reg); iova += granule >> 12; } while (size -= granule); } } +static void arm_smmu_tlb_inv_range_s2(unsigned long iova, size_t size, + size_t granule, bool leaf, void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + struct arm_smmu_device *smmu = smmu_domain->smmu; + void __iomem *reg = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx); + + if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK) + wmb(); + + reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : ARM_SMMU_CB_S2_TLBIIPAS2; + iova >>= 12; + do { + smmu_write_atomic_lq(iova, reg); + iova += granule >> 12; + } while (size -= granule); +} + /* * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears * almost negligible, but the benefit of getting the first one in as far ahead @@ -550,13 +558,13 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size, static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context_s1, - .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, + .tlb_add_flush = arm_smmu_tlb_inv_range_s1, .tlb_sync = arm_smmu_tlb_sync_context, }; static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, - .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, + .tlb_add_flush = arm_smmu_tlb_inv_range_s2, .tlb_sync = arm_smmu_tlb_sync_context, }; -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 13/17] iommu/arm-smmu: Add implementation infrastructure
Add some nascent infrastructure for handling implementation-specific details outside the flow of the architectural code. This will allow us to keep mutually-incompatible vendor-specific hooks in their own files where the respective interested parties can maintain them with minimal chance of conflicts. As somewhat of a template, we'll start with a general place to collect the relatively trivial existing quirks. Signed-off-by: Robin Murphy --- MAINTAINERS | 3 +- drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 13 + drivers/iommu/arm-smmu.c | 82 ++-- drivers/iommu/arm-smmu.h | 89 +++ 5 files changed, 108 insertions(+), 81 deletions(-) create mode 100644 drivers/iommu/arm-smmu-impl.c diff --git a/MAINTAINERS b/MAINTAINERS index 6426db5198f0..35ff49ac303b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1350,8 +1350,7 @@ M:Will Deacon R: Robin Murphy L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) S: Maintained -F: drivers/iommu/arm-smmu.c -F: drivers/iommu/arm-smmu-v3.c +F: drivers/iommu/arm-smmu* F: drivers/iommu/io-pgtable-arm.c F: drivers/iommu/io-pgtable-arm-v7s.c diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index f13f36ae1af6..a2729aadd300 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c new file mode 100644 index ..efeb6d78da17 --- /dev/null +++ b/drivers/iommu/arm-smmu-impl.c @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Miscellaneous Arm SMMU implementation and integration quirks +// Copyright (C) 2019 Arm Limited + +#define pr_fmt(fmt) "arm-smmu: " fmt + +#include "arm-smmu.h" + + +struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) +{ + return smmu; +} diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index f3b8301a3059..1e8153182830 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -19,7 +19,6 @@ #include #include -#include #include #include #include @@ -29,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -41,7 +39,6 @@ #include #include #include -#include #include #include @@ -66,9 +63,6 @@ #define TLB_LOOP_TIMEOUT 100 /* 1s! */ #define TLB_SPIN_COUNT 10 -/* Maximum number of context banks per SMMU */ -#define ARM_SMMU_MAX_CBS 128 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -86,19 +80,6 @@ module_param(disable_bypass, bool, S_IRUGO); MODULE_PARM_DESC(disable_bypass, "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); -enum arm_smmu_arch_version { - ARM_SMMU_V1, - ARM_SMMU_V1_64K, - ARM_SMMU_V2, -}; - -enum arm_smmu_implementation { - GENERIC_SMMU, - ARM_MMU500, - CAVIUM_SMMUV2, - QCOM_SMMUV2, -}; - struct arm_smmu_s2cr { struct iommu_group *group; int count; @@ -136,65 +117,6 @@ struct arm_smmu_master_cfg { #define for_each_cfg_sme(fw, i, idx) \ for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i) -struct arm_smmu_device { - struct device *dev; - - void __iomem*base; - unsigned intnumpage; - unsigned intpgshift; - -#define ARM_SMMU_FEAT_COHERENT_WALK(1 << 0) -#define ARM_SMMU_FEAT_STREAM_MATCH (1 << 1) -#define ARM_SMMU_FEAT_TRANS_S1 (1 << 2) -#define ARM_SMMU_FEAT_TRANS_S2 (1 << 3) -#define ARM_SMMU_FEAT_TRANS_NESTED (1 << 4) -#define ARM_SMMU_FEAT_TRANS_OPS(1 << 5) -#define ARM_SMMU_FEAT_VMID16 (1 << 6) -#define ARM_SMMU_FEAT_FMT_AARCH64_4K (1 << 7) -#define ARM_SMMU_FEAT_FMT_AARCH64_16K (1 << 8) -#define ARM_SMMU_FEAT_FMT_AARCH64_64K (1 << 9) -#define ARM_SMMU_FEAT_FMT_AARCH32_L(1 << 10) -#define ARM_SMMU_FEAT_FMT_AARCH32_S(1 << 11) -#define ARM_SMMU_FEAT_EXIDS(1 << 12) - u32 features; - -#define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0) - u32 options; -
[PATCH v2 10/17] iommu/arm-smmu: Abstract context bank accesses
Context bank accesses are fiddly enough to deserve a number of extra helpers to keep the callsites looking sane, even though there are only one or two of each. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 138 +-- 1 file changed, 73 insertions(+), 65 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d612dda2889f..e72554f334ee 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -82,9 +82,6 @@ ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) \ ? 0x400 : 0)) -/* Translation context bank */ -#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->numpage + (n)) << (smmu)->pgshift)) - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -265,13 +262,34 @@ static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset, writel_relaxed(val, arm_smmu_page(smmu, page) + offset); } +static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset) +{ + return readq_relaxed(arm_smmu_page(smmu, page) + offset); +} + +static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset, + u64 val) +{ + writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); +} + #define ARM_SMMU_GR1 1 +#define ARM_SMMU_CB(s, n) ((s)->numpage + (n)) #define arm_smmu_gr1_read(s, o)\ arm_smmu_readl((s), ARM_SMMU_GR1, (o)) #define arm_smmu_gr1_write(s, o, v)\ arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v)) +#define arm_smmu_cb_read(s, n, o) \ + arm_smmu_readl((s), ARM_SMMU_CB((s), (n)), (o)) +#define arm_smmu_cb_write(s, n, o, v) \ + arm_smmu_writel((s), ARM_SMMU_CB((s), (n)), (o), (v)) +#define arm_smmu_cb_readq(s, n, o) \ + arm_smmu_readq((s), ARM_SMMU_CB((s), (n)), (o)) +#define arm_smmu_cb_writeq(s, n, o, v) \ + arm_smmu_writeq((s), ARM_SMMU_CB((s), (n)), (o), (v)) + struct arm_smmu_option_prop { u32 opt; const char *prop; @@ -427,15 +445,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx) } /* Wait for any pending TLB invalidations to complete */ -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, - void __iomem *sync, void __iomem *status) +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) { unsigned int spin_cnt, delay; + u32 reg; - writel_relaxed(QCOM_DUMMY_VAL, sync); + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { - if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & sTLBGSTATUS_GSACTIVE)) return; cpu_relax(); } @@ -447,12 +467,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu) { - void __iomem *base = ARM_SMMU_GR0(smmu); unsigned long flags; spin_lock_irqsave(>global_sync_lock, flags); - __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC, - base + ARM_SMMU_GR0_sTLBGSTATUS); + __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC, + ARM_SMMU_GR0_sTLBGSTATUS); spin_unlock_irqrestore(>global_sync_lock, flags); } @@ -460,12 +479,11 @@ static void arm_smmu_tlb_sync_context(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; struct arm_smmu_device *smmu = smmu_domain->smmu; - void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx); unsigned long flags; spin_lock_irqsave(_domain->cb_lock, flags); - __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC, - base + ARM_SMMU_CB_TLBSTATUS); + __arm_smmu_tlb_sync(smmu, ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx), + ARM_SMMU_CB_TLBSYNC, ARM_SMMU_CB_TLBSTATUS); spin_unlock_irqrestore(_domain->cb_lock, flags); } @@ -479,14 +497,13 @@ static void arm_smmu_tlb_sync_vmid(void *cookie) static void arm_smmu_tlb_inv_context_s1(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; - struct arm_smmu_cfg *cfg = _domain->cfg; - void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx); - /* -* NOTE: this is not a relaxed write; it needs to guarantee that PTEs -* cleared by the current CPU are visible to the SMMU before the TLBI. +* The TLBI write may be relaxed, so ensure that PTEs cleared by the +* current
[PATCH v2 09/17] iommu/arm-smmu: Abstract GR1 accesses
Introduce some register access abstractions which we will later use to encapsulate various quirks. GR1 is the easiest page to start with. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 34 +++--- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 24b4de1a4185..d612dda2889f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -71,7 +71,6 @@ /* SMMU global address space */ #define ARM_SMMU_GR0(smmu) ((smmu)->base) -#define ARM_SMMU_GR1(smmu) ((smmu)->base + (1 << (smmu)->pgshift)) /* * SMMU global address space with conditional offset to access secure @@ -250,6 +249,29 @@ struct arm_smmu_domain { struct iommu_domain domain; }; +static void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n) +{ + return smmu->base + (n << smmu->pgshift); +} + +static u32 arm_smmu_readl(struct arm_smmu_device *smmu, int page, int offset) +{ + return readl_relaxed(arm_smmu_page(smmu, page) + offset); +} + +static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset, + u32 val) +{ + writel_relaxed(val, arm_smmu_page(smmu, page) + offset); +} + +#define ARM_SMMU_GR1 1 + +#define arm_smmu_gr1_read(s, o)\ + arm_smmu_readl((s), ARM_SMMU_GR1, (o)) +#define arm_smmu_gr1_write(s, o, v)\ + arm_smmu_writel((s), ARM_SMMU_GR1, (o), (v)) + struct arm_smmu_option_prop { u32 opt; const char *prop; @@ -574,7 +596,6 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = _domain->cfg; struct arm_smmu_device *smmu = smmu_domain->smmu; - void __iomem *gr1_base = ARM_SMMU_GR1(smmu); void __iomem *cb_base; cb_base = ARM_SMMU_CB(smmu, cfg->cbndx); @@ -585,7 +606,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); - cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx)); + cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx)); dev_err_ratelimited(smmu->dev, "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", @@ -676,7 +697,7 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) bool stage1; struct arm_smmu_cb *cb = >cbs[idx]; struct arm_smmu_cfg *cfg = cb->cfg; - void __iomem *cb_base, *gr1_base; + void __iomem *cb_base; cb_base = ARM_SMMU_CB(smmu, idx); @@ -686,7 +707,6 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) return; } - gr1_base = ARM_SMMU_GR1(smmu); stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS; /* CBA2R */ @@ -699,7 +719,7 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) if (smmu->features & ARM_SMMU_FEAT_VMID16) reg |= FIELD_PREP(CBA2R_VMID16, cfg->vmid); - writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBA2R(idx)); + arm_smmu_gr1_write(smmu, ARM_SMMU_GR1_CBA2R(idx), reg); } /* CBAR */ @@ -718,7 +738,7 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) /* 8-bit VMIDs live in CBAR */ reg |= FIELD_PREP(CBAR_VMID, cfg->vmid); } - writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(idx)); + arm_smmu_gr1_write(smmu, ARM_SMMU_GR1_CBAR(idx), reg); /* * TCR -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 08/17] iommu/arm-smmu: Get rid of weird "atomic" write
The smmu_write_atomic_lq oddity made some sense when the context format was effectively tied to CONFIG_64BIT, but these days it's simpler to just pick an explicit access size based on the format for the one-and-a-half times we actually care. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 23 +++ 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5b12e96d7878..24b4de1a4185 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -83,17 +83,6 @@ ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) \ ? 0x400 : 0)) -/* - * Some 64-bit registers only make sense to write atomically, but in such - * cases all the data relevant to AArch32 formats lies within the lower word, - * therefore this actually makes more sense than it might first appear. - */ -#ifdef CONFIG_64BIT -#define smmu_write_atomic_lq writeq_relaxed -#else -#define smmu_write_atomic_lq writel_relaxed -#endif - /* Translation context bank */ #define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->numpage + (n)) << (smmu)->pgshift)) @@ -533,7 +522,10 @@ static void arm_smmu_tlb_inv_range_s2(unsigned long iova, size_t size, reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : ARM_SMMU_CB_S2_TLBIIPAS2; iova >>= 12; do { - smmu_write_atomic_lq(iova, reg); + if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64) + writeq_relaxed(iova, reg); + else + writel_relaxed(iova, reg); iova += granule >> 12; } while (size -= granule); } @@ -1371,11 +1363,10 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain, cb_base = ARM_SMMU_CB(smmu, cfg->cbndx); spin_lock_irqsave(_domain->cb_lock, flags); - /* ATS1 registers can only be written atomically */ va = iova & ~0xfffUL; - if (smmu->version == ARM_SMMU_V2) - smmu_write_atomic_lq(va, cb_base + ARM_SMMU_CB_ATS1PR); - else /* Register is only 32-bit in v1 */ + if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) + writeq_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR); + else writel_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR); if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp, -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 06/17] iommu/arm-smmu: Rework cb_base handling
To keep register-access quirks manageable, we want to structure things to avoid needing too many individual overrides. It seems fairly clean to have a single interface which handles both global and context registers in terms of the architectural pages, so the first preparatory step is to rework cb_base into a page number rather than an absolute address. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a877de006d02..19126230c780 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -95,7 +95,7 @@ #endif /* Translation context bank */ -#define ARM_SMMU_CB(smmu, n) ((smmu)->cb_base + ((n) << (smmu)->pgshift)) +#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->numpage + (n)) << (smmu)->pgshift)) #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 @@ -168,8 +168,8 @@ struct arm_smmu_device { struct device *dev; void __iomem*base; - void __iomem*cb_base; - unsigned long pgshift; + unsigned intnumpage; + unsigned intpgshift; #define ARM_SMMU_FEAT_COHERENT_WALK(1 << 0) #define ARM_SMMU_FEAT_STREAM_MATCH (1 << 1) @@ -1815,7 +1815,7 @@ static int arm_smmu_id_size_to_bits(int size) static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) { - unsigned long size; + unsigned int size; void __iomem *gr0_base = ARM_SMMU_GR0(smmu); u32 id; bool cttw_reg, cttw_fw = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK; @@ -1899,7 +1899,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) return -ENOMEM; dev_notice(smmu->dev, - "\tstream matching with %lu register groups", size); + "\tstream matching with %u register groups", size); } /* s2cr->type == 0 means translation, so initialise explicitly */ smmu->s2crs = devm_kmalloc_array(smmu->dev, size, sizeof(*smmu->s2crs), @@ -1925,11 +1925,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) /* Check for size mismatch of SMMU address space from mapped region */ size = 1 << (FIELD_GET(ID1_NUMPAGENDXB, id) + 1); - size <<= smmu->pgshift; - if (smmu->cb_base != gr0_base + size) + if (smmu->numpage != 2 * size << smmu->pgshift) dev_warn(smmu->dev, - "SMMU address space size (0x%lx) differs from mapped region size (0x%tx)!\n", - size * 2, (smmu->cb_base - gr0_base) * 2); + "SMMU address space size (0x%x) differs from mapped region size (0x%x)!\n", + 2 * size << smmu->pgshift, smmu->numpage); + /* Now properly encode NUMPAGE to subsequently derive SMMU_CB_BASE */ + smmu->numpage = size; smmu->num_s2_context_banks = FIELD_GET(ID1_NUMS2CB, id); smmu->num_context_banks = FIELD_GET(ID1_NUMCB, id); @@ -2200,7 +2201,11 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->base = devm_ioremap_resource(dev, res); if (IS_ERR(smmu->base)) return PTR_ERR(smmu->base); - smmu->cb_base = smmu->base + resource_size(res) / 2; + /* +* The resource size should effectively match the value of SMMU_TOP; +* stash that temporarily until we know PAGESIZE to validate it with. +*/ + smmu->numpage = resource_size(res); num_irqs = 0; while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 04/17] iommu/arm-smmu: Convert GR1 registers to bitfields
As for GR0, use the bitfield helpers to make GR1 usage a little cleaner, and use it as an opportunity to audit and tidy the definitions. This tweaks the handling of CBAR types to match what we did for S2CR a while back, and fixes a couple of names which didn't quite match the latest architecture spec (IHI0062D.c). Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-regs.h | 33 ++--- drivers/iommu/arm-smmu.c | 18 +- 2 files changed, 23 insertions(+), 28 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index 351ab09c7d4f..8522330ee624 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -108,30 +108,25 @@ enum arm_smmu_s2cr_type { /* Context bank attribute registers */ #define ARM_SMMU_GR1_CBAR(n) (0x0 + ((n) << 2)) -#define CBAR_VMID_SHIFT0 -#define CBAR_VMID_MASK 0xff -#define CBAR_S1_BPSHCFG_SHIFT 8 -#define CBAR_S1_BPSHCFG_MASK 3 -#define CBAR_S1_BPSHCFG_NSH3 -#define CBAR_S1_MEMATTR_SHIFT 12 -#define CBAR_S1_MEMATTR_MASK 0xf +#define CBAR_IRPTNDX GENMASK(31, 24) +#define CBAR_TYPE GENMASK(17, 16) +enum arm_smmu_cbar_type { + CBAR_TYPE_S2_TRANS, + CBAR_TYPE_S1_TRANS_S2_BYPASS, + CBAR_TYPE_S1_TRANS_S2_FAULT, + CBAR_TYPE_S1_TRANS_S2_TRANS, +}; +#define CBAR_S1_MEMATTRGENMASK(15, 12) #define CBAR_S1_MEMATTR_WB 0xf -#define CBAR_TYPE_SHIFT16 -#define CBAR_TYPE_MASK 0x3 -#define CBAR_TYPE_S2_TRANS (0 << CBAR_TYPE_SHIFT) -#define CBAR_TYPE_S1_TRANS_S2_BYPASS (1 << CBAR_TYPE_SHIFT) -#define CBAR_TYPE_S1_TRANS_S2_FAULT(2 << CBAR_TYPE_SHIFT) -#define CBAR_TYPE_S1_TRANS_S2_TRANS(3 << CBAR_TYPE_SHIFT) -#define CBAR_IRPTNDX_SHIFT 24 -#define CBAR_IRPTNDX_MASK 0xff +#define CBAR_S1_BPSHCFGGENMASK(9, 8) +#define CBAR_S1_BPSHCFG_NSH3 +#define CBAR_VMID GENMASK(7, 0) #define ARM_SMMU_GR1_CBFRSYNRA(n) (0x400 + ((n) << 2)) #define ARM_SMMU_GR1_CBA2R(n) (0x800 + ((n) << 2)) -#define CBA2R_RW64_32BIT (0 << 0) -#define CBA2R_RW64_64BIT (1 << 0) -#define CBA2R_VMID_SHIFT 16 -#define CBA2R_VMID_MASK0x +#define CBA2R_VMID16 GENMASK(31, 16) +#define CBA2R_VA64 BIT(0) #define ARM_SMMU_CB_SCTLR 0x0 #define ARM_SMMU_CB_ACTLR 0x4 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 105015798c06..293a95b0d682 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -237,7 +237,7 @@ struct arm_smmu_cfg { u16 asid; u16 vmid; }; - u32 cbar; + enum arm_smmu_cbar_type cbar; enum arm_smmu_context_fmt fmt; }; #define INVALID_IRPTNDX0xff @@ -692,31 +692,31 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx) /* CBA2R */ if (smmu->version > ARM_SMMU_V1) { if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) - reg = CBA2R_RW64_64BIT; + reg = CBA2R_VA64; else - reg = CBA2R_RW64_32BIT; + reg = 0; /* 16-bit VMIDs live in CBA2R */ if (smmu->features & ARM_SMMU_FEAT_VMID16) - reg |= cfg->vmid << CBA2R_VMID_SHIFT; + reg |= FIELD_PREP(CBA2R_VMID16, cfg->vmid); writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBA2R(idx)); } /* CBAR */ - reg = cfg->cbar; + reg = FIELD_PREP(CBAR_TYPE, cfg->cbar); if (smmu->version < ARM_SMMU_V2) - reg |= cfg->irptndx << CBAR_IRPTNDX_SHIFT; + reg |= FIELD_PREP(CBAR_IRPTNDX, cfg->irptndx); /* * Use the weakest shareability/memory types, so they are * overridden by the ttbcr/pte. */ if (stage1) { - reg |= (CBAR_S1_BPSHCFG_NSH << CBAR_S1_BPSHCFG_SHIFT) | - (CBAR_S1_MEMATTR_WB << CBAR_S1_MEMATTR_SHIFT); + reg |= FIELD_PREP(CBAR_S1_BPSHCFG, CBAR_S1_BPSHCFG_NSH) | + FIELD_PREP(CBAR_S1_MEMATTR, CBAR_S1_MEMATTR_WB); } else if (!(smmu->features & ARM_SMMU_FEAT_VMID16)) { /* 8-bit VMIDs live in CBAR */ - reg |= cfg->vmid << CBAR_VMID_SHIFT; + reg |= FIELD_PREP(CBAR_VMID, cfg->vmid); } writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(idx)); -- 2.21.0.dirty ___ iommu mailing list
[PATCH v2 02/17] iommu/qcom: Mask TLBI addresses correctly
As with arm-smmu from whence this code was borrowed, the IOVAs passed in here happen to be at least page-aligned anyway, but still; oh dear. Signed-off-by: Robin Murphy --- drivers/iommu/qcom_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 34d0b9783b3e..bed948c3058a 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -155,7 +155,7 @@ static void qcom_iommu_tlb_inv_range_nosync(unsigned long iova, size_t size, struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]); size_t s = size; - iova &= ~12UL; + iova = (iova >> 12) << 12; iova |= ctx->asid; do { iommu_writel(ctx, reg, iova); -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 05/17] iommu/arm-smmu: Convert context bank registers to bitfields
Finish the final part of the job, once again updating some names to match the current spec. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-regs.h | 86 ++- drivers/iommu/arm-smmu.c | 16 +++ drivers/iommu/qcom_iommu.c| 13 +++--- 3 files changed, 59 insertions(+), 56 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index 8522330ee624..a8e288192285 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -129,19 +129,59 @@ enum arm_smmu_cbar_type { #define CBA2R_VA64 BIT(0) #define ARM_SMMU_CB_SCTLR 0x0 +#define SCTLR_S1_ASIDPNE BIT(12) +#define SCTLR_CFCFGBIT(7) +#define SCTLR_CFIE BIT(6) +#define SCTLR_CFRE BIT(5) +#define SCTLR_EBIT(4) +#define SCTLR_AFE BIT(2) +#define SCTLR_TRE BIT(1) +#define SCTLR_MBIT(0) + #define ARM_SMMU_CB_ACTLR 0x4 + #define ARM_SMMU_CB_RESUME 0x8 -#define ARM_SMMU_CB_TTBCR2 0x10 +#define RESUME_TERMINATE BIT(0) + +#define ARM_SMMU_CB_TCR2 0x10 +#define TCR2_SEP GENMASK(17, 15) +#define TCR2_SEP_UPSTREAM 0x7 +#define TCR2_ASBIT(4) + #define ARM_SMMU_CB_TTBR0 0x20 #define ARM_SMMU_CB_TTBR1 0x28 -#define ARM_SMMU_CB_TTBCR 0x30 +#define TTBRn_ASID GENMASK_ULL(63, 48) + +#define ARM_SMMU_CB_TCR0x30 #define ARM_SMMU_CB_CONTEXTIDR 0x34 #define ARM_SMMU_CB_S1_MAIR0 0x38 #define ARM_SMMU_CB_S1_MAIR1 0x3c + #define ARM_SMMU_CB_PAR0x50 +#define CB_PAR_F BIT(0) + #define ARM_SMMU_CB_FSR0x58 +#define FSR_MULTI BIT(31) +#define FSR_SS BIT(30) +#define FSR_UUTBIT(8) +#define FSR_ASFBIT(7) +#define FSR_TLBLKF BIT(6) +#define FSR_TLBMCF BIT(5) +#define FSR_EF BIT(4) +#define FSR_PF BIT(3) +#define FSR_AFFBIT(2) +#define FSR_TF BIT(1) + +#define FSR_IGN(FSR_AFF | FSR_ASF | \ +FSR_TLBMCF | FSR_TLBLKF) +#define FSR_FAULT (FSR_MULTI | FSR_SS | FSR_UUT | \ +FSR_EF | FSR_PF | FSR_TF | FSR_IGN) + #define ARM_SMMU_CB_FAR0x60 + #define ARM_SMMU_CB_FSYNR0 0x68 +#define FSYNR0_WNR BIT(4) + #define ARM_SMMU_CB_S1_TLBIVA 0x600 #define ARM_SMMU_CB_S1_TLBIASID0x610 #define ARM_SMMU_CB_S1_TLBIVAL 0x620 @@ -150,46 +190,8 @@ enum arm_smmu_cbar_type { #define ARM_SMMU_CB_TLBSYNC0x7f0 #define ARM_SMMU_CB_TLBSTATUS 0x7f4 #define ARM_SMMU_CB_ATS1PR 0x800 + #define ARM_SMMU_CB_ATSR 0x8f0 - -#define SCTLR_S1_ASIDPNE (1 << 12) -#define SCTLR_CFCFG(1 << 7) -#define SCTLR_CFIE (1 << 6) -#define SCTLR_CFRE (1 << 5) -#define SCTLR_E(1 << 4) -#define SCTLR_AFE (1 << 2) -#define SCTLR_TRE (1 << 1) -#define SCTLR_M(1 << 0) - -#define CB_PAR_F (1 << 0) - -#define ATSR_ACTIVE(1 << 0) - -#define RESUME_RETRY (0 << 0) -#define RESUME_TERMINATE (1 << 0) - -#define TTBCR2_SEP_SHIFT 15 -#define TTBCR2_SEP_UPSTREAM(0x7 << TTBCR2_SEP_SHIFT) -#define TTBCR2_AS (1 << 4) - -#define TTBRn_ASID_SHIFT 48 - -#define FSR_MULTI (1 << 31) -#define FSR_SS (1 << 30) -#define FSR_UUT(1 << 8) -#define FSR_ASF(1 << 7) -#define FSR_TLBLKF (1 << 6) -#define FSR_TLBMCF (1 << 5) -#define FSR_EF (1 << 4) -#define FSR_PF (1 << 3) -#define FSR_AFF(1 << 2) -#define FSR_TF (1 << 1) - -#define FSR_IGN(FSR_AFF | FSR_ASF | \ -FSR_TLBMCF | FSR_TLBLKF) -#define FSR_FAULT (FSR_MULTI | FSR_SS | FSR_UUT | \ -FSR_EF | FSR_PF | FSR_TF | FSR_IGN) - -#define FSYNR0_WNR (1 << 4) +#define
[PATCH v2 03/17] iommu/arm-smmu: Convert GR0 registers to bitfields
FIELD_PREP remains a terrible name, but the overall simplification will make further work on this stuff that much more manageable. This also serves as an audit of the header, wherein we can impose a consistent grouping and ordering of the offset and field definitions Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-regs.h | 126 -- drivers/iommu/arm-smmu.c | 51 +++--- 2 files changed, 84 insertions(+), 93 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index 1c278f7ae888..351ab09c7d4f 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -10,111 +10,101 @@ #ifndef _ARM_SMMU_REGS_H #define _ARM_SMMU_REGS_H +#include + /* Configuration registers */ #define ARM_SMMU_GR0_sCR0 0x0 -#define sCR0_CLIENTPD (1 << 0) -#define sCR0_GFRE (1 << 1) -#define sCR0_GFIE (1 << 2) -#define sCR0_EXIDENABLE(1 << 3) -#define sCR0_GCFGFRE (1 << 4) -#define sCR0_GCFGFIE (1 << 5) -#define sCR0_USFCFG(1 << 10) -#define sCR0_VMIDPNE (1 << 11) -#define sCR0_PTM (1 << 12) -#define sCR0_FB(1 << 13) -#define sCR0_VMID16EN (1 << 31) -#define sCR0_BSU_SHIFT 14 -#define sCR0_BSU_MASK 0x3 +#define sCR0_VMID16EN BIT(31) +#define sCR0_BSU GENMASK(15, 14) +#define sCR0_FBBIT(13) +#define sCR0_PTM BIT(12) +#define sCR0_VMIDPNE BIT(11) +#define sCR0_USFCFGBIT(10) +#define sCR0_GCFGFIE BIT(5) +#define sCR0_GCFGFRE BIT(4) +#define sCR0_EXIDENABLEBIT(3) +#define sCR0_GFIE BIT(2) +#define sCR0_GFRE BIT(1) +#define sCR0_CLIENTPD BIT(0) /* Auxiliary Configuration register */ #define ARM_SMMU_GR0_sACR 0x10 /* Identification registers */ #define ARM_SMMU_GR0_ID0 0x20 +#define ID0_S1TS BIT(30) +#define ID0_S2TS BIT(29) +#define ID0_NTSBIT(28) +#define ID0_SMSBIT(27) +#define ID0_ATOSNS BIT(26) +#define ID0_PTFS_NO_AARCH32BIT(25) +#define ID0_PTFS_NO_AARCH32S BIT(24) +#define ID0_NUMIRPTGENMASK(23, 16) +#define ID0_CTTW BIT(14) +#define ID0_NUMSIDBGENMASK(12, 9) +#define ID0_EXIDS BIT(8) +#define ID0_NUMSMRGGENMASK(7, 0) + #define ARM_SMMU_GR0_ID1 0x24 +#define ID1_PAGESIZE BIT(31) +#define ID1_NUMPAGENDXBGENMASK(30, 28) +#define ID1_NUMS2CBGENMASK(23, 16) +#define ID1_NUMCB GENMASK(7, 0) + #define ARM_SMMU_GR0_ID2 0x28 +#define ID2_VMID16 BIT(15) +#define ID2_PTFS_64K BIT(14) +#define ID2_PTFS_16K BIT(13) +#define ID2_PTFS_4KBIT(12) +#define ID2_UBSGENMASK(11, 8) +#define ID2_OASGENMASK(7, 4) +#define ID2_IASGENMASK(3, 0) + #define ARM_SMMU_GR0_ID3 0x2c #define ARM_SMMU_GR0_ID4 0x30 #define ARM_SMMU_GR0_ID5 0x34 #define ARM_SMMU_GR0_ID6 0x38 + #define ARM_SMMU_GR0_ID7 0x3c +#define ID7_MAJOR GENMASK(7, 4) +#define ID7_MINOR GENMASK(3, 0) + #define ARM_SMMU_GR0_sGFSR 0x48 #define ARM_SMMU_GR0_sGFSYNR0 0x50 #define ARM_SMMU_GR0_sGFSYNR1 0x54 #define ARM_SMMU_GR0_sGFSYNR2 0x58 -#define ID0_S1TS (1 << 30) -#define ID0_S2TS (1 << 29) -#define ID0_NTS(1 << 28) -#define ID0_SMS(1 << 27) -#define ID0_ATOSNS (1 << 26) -#define ID0_PTFS_NO_AARCH32(1 << 25) -#define ID0_PTFS_NO_AARCH32S (1 << 24) -#define ID0_CTTW (1 << 14) -#define ID0_NUMIRPT_SHIFT 16 -#define ID0_NUMIRPT_MASK 0xff -#define ID0_NUMSIDB_SHIFT 9 -#define ID0_NUMSIDB_MASK 0xf -#define ID0_EXIDS (1 << 8) -#define ID0_NUMSMRG_SHIFT 0 -#define ID0_NUMSMRG_MASK 0xff - -#define ID1_PAGESIZE (1 << 31) -#define ID1_NUMPAGENDXB_SHIFT 28 -#define ID1_NUMPAGENDXB_MASK 7 -#define ID1_NUMS2CB_SHIFT 16 -#define ID1_NUMS2CB_MASK
[PATCH v2 01/17] iommu/arm-smmu: Mask TLBI address correctly
The less said about "~12UL" the better. Oh dear. We get away with it due to calling constraints that mean IOVAs are implicitly at least page-aligned to begin with, but still; oh dear. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 64977c131ee6..d60ee292ecee 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -504,7 +504,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA; if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { - iova &= ~12UL; + iova = (iova >> 12) << 12; iova |= cfg->asid; do { writel_relaxed(iova, reg); -- 2.21.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 00/17] Arm SMMU refactoring
Hi all, v1 for context: https://patchwork.kernel.org/cover/11087347/ Here's a quick v2 attempting to address all the minor comments; I've tweaked a whole bunch of names, added some verbosity in macros and comments for clarity, and rejigged arm_smmu_impl_init() for a bit more structure. The (new) patches #1 and #2 are up front as conceptual fixes, although they're not actually critical - it turns out to be more of an embarrassment than a real problem in practice. For ease of reference, the overall diff against v1 is attached below. Robin. Robin Murphy (17): iommu/arm-smmu: Mask TLBI address correctly iommu/qcom: Mask TLBI addresses correctly iommu/arm-smmu: Convert GR0 registers to bitfields iommu/arm-smmu: Convert GR1 registers to bitfields iommu/arm-smmu: Convert context bank registers to bitfields iommu/arm-smmu: Rework cb_base handling iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync() iommu/arm-smmu: Get rid of weird "atomic" write iommu/arm-smmu: Abstract GR1 accesses iommu/arm-smmu: Abstract context bank accesses iommu/arm-smmu: Abstract GR0 accesses iommu/arm-smmu: Rename arm-smmu-regs.h iommu/arm-smmu: Add implementation infrastructure iommu/arm-smmu: Move Secure access quirk to implementation iommu/arm-smmu: Add configuration implementation hook iommu/arm-smmu: Add reset implementation hook iommu/arm-smmu: Add context init implementation hook MAINTAINERS | 3 +- drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 174 +++ drivers/iommu/arm-smmu-regs.h | 210 - drivers/iommu/arm-smmu.c | 573 +++--- drivers/iommu/arm-smmu.h | 394 +++ drivers/iommu/qcom_iommu.c| 17 +- 7 files changed, 764 insertions(+), 609 deletions(-) create mode 100644 drivers/iommu/arm-smmu-impl.c delete mode 100644 drivers/iommu/arm-smmu-regs.h create mode 100644 drivers/iommu/arm-smmu.h ->8- diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 3c731e087854..e22e9004f449 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -28,7 +28,7 @@ static int arm_smmu_gr0_ns(int offset) static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page, int offset) { - if (page == 0) + if (page == ARM_SMMU_GR0) offset = arm_smmu_gr0_ns(offset); return readl_relaxed(arm_smmu_page(smmu, page) + offset); } @@ -36,7 +36,7 @@ static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page, static void arm_smmu_write_ns(struct arm_smmu_device *smmu, int page, int offset, u32 val) { - if (page == 0) + if (page == ARM_SMMU_GR0) offset = arm_smmu_gr0_ns(offset); writel_relaxed(val, arm_smmu_page(smmu, page) + offset); } @@ -52,18 +52,17 @@ struct cavium_smmu { struct arm_smmu_device smmu; u32 id_base; }; -#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu) static int cavium_cfg_probe(struct arm_smmu_device *smmu) { static atomic_t context_count = ATOMIC_INIT(0); + struct cavium_smmu *cs = container_of(smmu, struct cavium_smmu, smmu); /* * Cavium CN88xx erratum #27704. * Ensure ASID and VMID allocation is unique across all SMMUs in * the system. */ - to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks, - _count); + cs->id_base = atomic_fetch_add(smmu->num_context_banks, _count); dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); return 0; @@ -71,12 +70,13 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu) int cavium_init_context(struct arm_smmu_domain *smmu_domain) { - u32 id_base = to_csmmu(smmu_domain->smmu)->id_base; + struct cavium_smmu *cs = container_of(smmu_domain->smmu, + struct cavium_smmu, smmu); if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) - smmu_domain->cfg.vmid += id_base; + smmu_domain->cfg.vmid += cs->id_base; else - smmu_domain->cfg.asid += id_base; + smmu_domain->cfg.asid += cs->id_base; return 0; } @@ -88,18 +88,18 @@ const struct arm_smmu_impl cavium_impl = { struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) { - struct cavium_smmu *csmmu; + struct cavium_smmu *cs; - csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL); - if (!csmmu) + cs = devm_kzalloc(smmu->dev, sizeof(*cs), GFP_KERNEL); + if (!cs) return ERR_PTR(-ENOMEM); - csmmu->smmu = *smmu; - csmmu->smmu.impl = _impl; + cs->smmu = *smmu; + cs->smmu.impl = _impl; devm_kfree(smmu->dev, smmu); - return >smmu; +
Messages to kexec@ get moderated (was: Crash kernel with 256 MB reserved memory runs into OOM condition)
Dear Dave, On 13.08.19 04:46, Dave Young wrote: > On 08/13/19 at 10:43am, Dave Young wrote: […] > The question is to Paul, also it would be always good to cc kexec mail > list for kexec and kdump issues. kexec@ was CCed in my original mail, but my messages got moderated. It’d great if you checked that with the list administrators. > Your mail to 'kexec' with the subject > > Crash kernel with 256 MB reserved memory runs into OOM condition > > Is being held until the list moderator can review it for approval. > > The reason it is being held: > > Message has a suspicious header > > Either the message will get posted to the list, or you will receive > notification of the moderator's decision. If you would like to cancel > this posting, please visit the following URL: > > > http://lists.infradead.org/mailman/confirm/kexec/a23ab6162ef34d099af5dd86c46113def5152bb1 Kind regards, Paul smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v6 5/8] iommu: Add bounce page APIs
On Thu, Aug 15, 2019 at 02:15:32PM +0800, Lu Baolu wrote: > iommu_map/unmap() APIs haven't parameters for dma direction and > attributions. These parameters are elementary for DMA APIs. Say, > after map, if the dma direction is TO_DEVICE and a bounce buffer is > used, we must sync the data from the original dma buffer to the bounce > buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap, > we need to sync the data from the bounce buffer onto the original > buffer. The DMA direction from DMA-API maps to the protections in iommu_map(): DMA_FROM_DEVICE:IOMMU_WRITE DMA_TO_DEVICE: IOMMU_READ DMA_BIDIRECTIONAL IOMMU_READ | IOMMU_WRITE And for the sync DMA-API also has separate functions for either direction. So I don't see why these extra functions are needed in the IOMMU-API. Regards, Joerg
Re: [PATCH v3 1/2] iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable format
On Wed, Aug 07, 2019 at 04:21:39PM -0600, Jordan Crouse wrote: > Add a new sub-format ARM_ADRENO_GPU_LPAE to set up TTBR0 and TTBR1 for > use by the Adreno GPU. This will allow The GPU driver to map global > buffers in the TTBR1 and leave the TTBR0 configured but unset and > free to be changed dynamically by the GPU. It would take a bit of code rework and un-static-ifying a few functions but I'm wondering if it would be cleaner to add the Adreno GPU pagetable format in a new file, such as io-pgtable-adreno.c. Jordan > Signed-off-by: Jordan Crouse > --- > > drivers/iommu/io-pgtable-arm.c | 214 > ++--- > drivers/iommu/io-pgtable.c | 1 + > include/linux/io-pgtable.h | 2 + > 3 files changed, 202 insertions(+), 15 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 161a7d5..8eb0dbb 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -112,13 +112,19 @@ > #define ARM_32_LPAE_TCR_EAE (1 << 31) > #define ARM_64_LPAE_S2_TCR_RES1 (1 << 31) > > +#define ARM_LPAE_TCR_EPD0(1 << 7) > #define ARM_LPAE_TCR_EPD1(1 << 23) > > #define ARM_LPAE_TCR_TG0_4K (0 << 14) > #define ARM_LPAE_TCR_TG0_64K (1 << 14) > #define ARM_LPAE_TCR_TG0_16K (2 << 14) > > +#define ARM_LPAE_TCR_TG1_4K (0 << 30) > +#define ARM_LPAE_TCR_TG1_64K (1 << 30) > +#define ARM_LPAE_TCR_TG1_16K (2 << 30) > + > #define ARM_LPAE_TCR_SH0_SHIFT 12 > +#define ARM_LPAE_TCR_SH1_SHIFT 28 > #define ARM_LPAE_TCR_SH0_MASK0x3 > #define ARM_LPAE_TCR_SH_NS 0 > #define ARM_LPAE_TCR_SH_OS 2 > @@ -126,6 +132,8 @@ > > #define ARM_LPAE_TCR_ORGN0_SHIFT 10 > #define ARM_LPAE_TCR_IRGN0_SHIFT 8 > +#define ARM_LPAE_TCR_ORGN1_SHIFT 26 > +#define ARM_LPAE_TCR_IRGN1_SHIFT 24 > #define ARM_LPAE_TCR_RGN_MASK0x3 > #define ARM_LPAE_TCR_RGN_NC 0 > #define ARM_LPAE_TCR_RGN_WBWA1 > @@ -136,6 +144,7 @@ > #define ARM_LPAE_TCR_SL0_MASK0x3 > > #define ARM_LPAE_TCR_T0SZ_SHIFT 0 > +#define ARM_LPAE_TCR_T1SZ_SHIFT 16 > #define ARM_LPAE_TCR_SZ_MASK 0xf > > #define ARM_LPAE_TCR_PS_SHIFT16 > @@ -152,6 +161,14 @@ > #define ARM_LPAE_TCR_PS_48_BIT 0x5ULL > #define ARM_LPAE_TCR_PS_52_BIT 0x6ULL > > +#define ARM_LPAE_TCR_SEP_SHIFT 47 > +#define ARM_LPAE_TCR_SEP_31 (0x0ULL << ARM_LPAE_TCR_SEP_SHIFT) > +#define ARM_LPAE_TCR_SEP_35 (0x1ULL << ARM_LPAE_TCR_SEP_SHIFT) > +#define ARM_LPAE_TCR_SEP_39 (0x2ULL << ARM_LPAE_TCR_SEP_SHIFT) > +#define ARM_LPAE_TCR_SEP_41 (0x3ULL << ARM_LPAE_TCR_SEP_SHIFT) > +#define ARM_LPAE_TCR_SEP_43 (0x4ULL << ARM_LPAE_TCR_SEP_SHIFT) > +#define ARM_LPAE_TCR_SEP_UPSTREAM(0x7ULL << ARM_LPAE_TCR_SEP_SHIFT) > + > #define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3) > #define ARM_LPAE_MAIR_ATTR_MASK 0xff > #define ARM_LPAE_MAIR_ATTR_DEVICE0x04 > @@ -426,7 +443,8 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct > arm_lpae_io_pgtable *data, > arm_lpae_iopte pte; > > if (data->iop.fmt == ARM_64_LPAE_S1 || > - data->iop.fmt == ARM_32_LPAE_S1) { > + data->iop.fmt == ARM_32_LPAE_S1 || > + data->iop.fmt == ARM_ADRENO_GPU_LPAE) { > pte = ARM_LPAE_PTE_nG; > if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ)) > pte |= ARM_LPAE_PTE_AP_RDONLY; > @@ -497,6 +515,21 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, > unsigned long iova, > return ret; > } > > +static int arm_adreno_gpu_lpae_map(struct io_pgtable_ops *ops, > + unsigned long iova, phys_addr_t paddr, size_t size, > + int iommu_prot) > +{ > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); > + unsigned long mask = 1UL << data->iop.cfg.ias; > + > + /* This configuration expects all iova addresses to be in TTBR1 */ > + if (WARN_ON(iova & mask)) > + return -ERANGE; > + > + /* Mask off the sign extended bits and map as usual */ > + return arm_lpae_map(ops, iova & (mask - 1), paddr, size, iommu_prot); > +} > + > static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int > lvl, > arm_lpae_iopte *ptep) > { > @@ -643,6 +676,22 @@ static size_t __arm_lpae_unmap(struct > arm_lpae_io_pgtable *data, > return __arm_lpae_unmap(data, iova, size, lvl + 1, ptep); > } > > +static size_t arm_adreno_gpu_lpae_unmap(struct io_pgtable_ops *ops, > +unsigned long iova, size_t size) > +{ > + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); > + arm_lpae_iopte *ptep = data->pgd; > + int lvl =
Re: [Freedreno] [PATCH v3 0/2] iommu/arm-smmu: Split pagetable support
On Wed, Aug 07, 2019 at 04:21:38PM -0600, Jordan Crouse wrote: > (Sigh, resend. I freaked out my SMTP server) > > This is part of an ongoing evolution for enabling split pagetable support for > arm-smmu. Previous versions can be found [1]. > > In the discussion for v2 Robin pointed out that this is a very Adreno specific > use case and that is exactly true. Not only do we want to configure and use a > pagetable in the TTBR1 space, we also want to configure the TTBR0 region but > not allocate a pagetable for it or touch it until the GPU hardware does so. As > much as I want it to be a generic concept it really isn't. > > This revision leans into that idea. Most of the same io-pgtable code is there > but now it is wrapped as an Adreno GPU specific format that is selected by the > compatible string in the arm-smmu device. > > Additionally, per Robin's suggestion we are skipping creating a TTBR0 > pagetable > to save on wasted memory. > > This isn't as clean as I would like it to be but I think that this is a better > direction than trying to pretend that the generic format would work. > > I'm tempting fate by posting this and then taking some time off, but I wanted > to try to kick off a conversation or at least get some flames so I can try to > refine this again next week. Please take a look and give some advice on the > direction. Will, Robin - Modulo the impl changes from Robin, do you think that using a dedicated pagetable format is the right approach for supporting split pagetables for the Adreno GPU? If so, then is adding the changes to io-pgtable-arm.c possible for 5.4 and then add the implementation specific code on top of Robin's stack later or do you feel they should come as part of a package deal? Jordan > Jordan Crouse (2): > iommu/io-pgtable-arm: Add support for ARM_ADRENO_GPU_LPAE io-pgtable > format > iommu/arm-smmu: Add support for Adreno GPU pagetable formats > > drivers/iommu/arm-smmu.c | 8 +- > drivers/iommu/io-pgtable-arm.c | 214 > ++--- > drivers/iommu/io-pgtable.c | 1 + > include/linux/io-pgtable.h | 2 + > 4 files changed, 209 insertions(+), 16 deletions(-) > > -- > 2.7.4 > > ___ > Freedreno mailing list > freedr...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/freedreno -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] iommu/arm-smmu-v3: add nr_ats_masters for quickly check
On Thu, Aug 15, 2019 at 01:44:39PM +0800, Zhen Lei wrote: > When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a > smmu domain does not contain any ats master, the operations of > arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() > are always executed. This will impact performance, especially in > multi-core and stress scenarios. For my FIO test scenario, about 8% > performance reduced. > > In fact, we can use a struct member to record how many ats masters that > the smmu contains. And check that without traverse the list and check all > masters one by one in the lock protection. > > Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") > Signed-off-by: Zhen Lei > --- > drivers/iommu/arm-smmu-v3.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 29056d9bb12aa01..154334d3310c9b8 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -631,6 +631,7 @@ struct arm_smmu_domain { > > struct io_pgtable_ops *pgtbl_ops; > boolnon_strict; > + int nr_ats_masters; > > enum arm_smmu_domain_stage stage; > union { > @@ -1531,7 +1532,16 @@ static int arm_smmu_atc_inv_domain(struct > arm_smmu_domain *smmu_domain, > struct arm_smmu_cmdq_ent cmd; > struct arm_smmu_master *master; > > - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) > + /* > + * The protectiom of spinlock(_domain->devices_lock) is omitted. > + * Because for a given master, its map/unmap operations should only be > + * happened after it has been attached and before it has been detached. > + * So that, if at least one master need to be atc invalidated, the > + * value of smmu_domain->nr_ats_masters can not be zero. > + * > + * This can alleviate performance loss in multi-core scenarios. > + */ I find this reasoning pretty dubious, since I think you're assuming that an endpoint cannot issue speculative ATS translation requests once its ATS capability is enabled. That said, I think it also means we should enable ATS in the STE *before* enabling it in the endpoint -- the current logic looks like it's the wrong way round to me (including in detach()). Anyway, these speculative translations could race with a concurrent unmap() call and end up with the ATC containing translations for unmapped pages, which I think we should try to avoid. Did the RCU approach not work out? You could use an rwlock instead as a temporary bodge if the performance doesn't hurt too much. Alternatively... maybe we could change the attach flow to do something like: enable_ats_in_ste(master); enable_ats_at_pcie_endpoint(master); spin_lock(devices_lock) add_to_device_list(master); nr_ats_masters++; spin_unlock(devices_lock); invalidate_atc(master); in which case, the concurrent unmapper will be doing something like: issue_tlbi(); smp_mb(); if (READ_ONCE(nr_ats_masters)) { ... } and I *think* that means that either the unmapper will see the nr_ats_masters update and perform the invalidation, or they'll miss the update but the attach will invalidate the ATC /after/ the TLBI in the command queue. Also, John's idea of converting this stuff over to my command batching mechanism should help a lot if we can defer this to sync time using the gather structure. Maybe an rwlock would be alright for that. Dunno. Will
Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook
On Thu, Aug 15, 2019 at 01:09:07PM +0100, Robin Murphy wrote: > On 15/08/2019 11:56, Will Deacon wrote: > >On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote: > >>Allocating and initialising a context for a domain is another point > >>where certain implementations are known to want special behaviour. > >>Currently the other half of the Cavium workaround comes into play here, > >>so let's finish the job to get the whole thing right out of the way. > >> > >>Signed-off-by: Robin Murphy > >>--- > >> drivers/iommu/arm-smmu-impl.c | 39 +-- > >> drivers/iommu/arm-smmu.c | 51 +++ > >> drivers/iommu/arm-smmu.h | 42 +++-- > >> 3 files changed, 86 insertions(+), 46 deletions(-) > >> > >>diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c > >>index c8904da08354..7a657d47b6ec 100644 > >>--- a/drivers/iommu/arm-smmu-impl.c > >>+++ b/drivers/iommu/arm-smmu-impl.c > >>@@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = { > >> }; > >>+struct cavium_smmu { > >>+ struct arm_smmu_device smmu; > >>+ u32 id_base; > >>+}; > >>+#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu) > > > >To be honest with you, I'd just use container_of directly for the two > >callsites that need it. "to_csmmu" isn't a great name when we're also got > >the calxeda thing in here. > > Sure, by this point I was mostly just going for completeness in terms of > sketching out an example for subclassing arm_smmu_device. The Tegra patches > will now serve as a more complete example anyway, so indeed we can live > without it here. > > >> static int cavium_cfg_probe(struct arm_smmu_device *smmu) > >> { > >>static atomic_t context_count = ATOMIC_INIT(0); > >>@@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device > >>*smmu) > >> * Ensure ASID and VMID allocation is unique across all SMMUs in > >> * the system. > >> */ > >>- smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks, > >>+ to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks, > >> _count); > >>dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum > >> 27704\n"); > >>return 0; > >> } > >>+int cavium_init_context(struct arm_smmu_domain *smmu_domain) > >>+{ > >>+ u32 id_base = to_csmmu(smmu_domain->smmu)->id_base; > >>+ > >>+ if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) > >>+ smmu_domain->cfg.vmid += id_base; > >>+ else > >>+ smmu_domain->cfg.asid += id_base; > >>+ > >>+ return 0; > >>+} > >>+ > >> const struct arm_smmu_impl cavium_impl = { > >>.cfg_probe = cavium_cfg_probe, > >>+ .init_context = cavium_init_context, > >> }; > >>+struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) > >>+{ > >>+ struct cavium_smmu *csmmu; > >>+ > >>+ csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL); > >>+ if (!csmmu) > >>+ return ERR_PTR(-ENOMEM); > >>+ > >>+ csmmu->smmu = *smmu; > >>+ csmmu->smmu.impl = _impl; > >>+ > >>+ devm_kfree(smmu->dev, smmu); > >>+ > >>+ return >smmu; > >>+} > >>+ > >> #define ARM_MMU500_ACTLR_CPRE (1 << 1) > >>@@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct > >>arm_smmu_device *smmu) > >>smmu->impl = _impl; > >>if (smmu->model == CAVIUM_SMMUV2) > >>- smmu->impl = _impl; > >>+ return cavium_smmu_impl_init(smmu); > >>if (smmu->model == ARM_MMU500) > >>smmu->impl = _mmu500_impl; > > > >Maybe rework this so we do the calxeda detection first (and return if we > >match), followed by a switch on smmu->model to make it crystal clear that > >we match only one? > > As I see it, "match only one" is really only a short-term thing, though, so > I didn't want to get *too* hung up on it. Ultimately we're going to have > cases where we need to combine e.g. MMU-500 implementation quirks with > platform integration quirks - I've been mostly planning on coming back to > think about that (and potentially rework this whole logic) later, but I > guess it wouldn't hurt to plan out a bit more structure from the start. I was going to ask something similar. I'm guessing that the intent is that we'll eventually we'll have a couple of arm-smmu-.c files and we'll need some sort of centralized place to set up the smmu->impl pointer. I had figured that it would be table based or something, but you make a good point about mixing and matching different workarounds. I don't really have a solution, just something I'm pondering while I'm thinking about how to start merging some of the qcom stuff into this. Jordan -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ___ iommu mailing list iommu@lists.linux-foundation.org
Re: next take at setting up a dma mask by default for platform devices
On Thu, 15 Aug 2019, Christoph Hellwig wrote: > On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote: > > I've taken the first 2 patches for 5.3-final. Given that patch 3 needs > > to be fixed, I'll wait for a respin of these before considering them. > > I have a respun version ready, but I'd really like to hear some > comments from usb developers about the approach before spamming > everyone again.. I didn't see any problems with your approach at first glance; it looked like a good idea. Alan Stern ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
On 15/08/2019 14:57, Will Deacon wrote: Hi Robin, On Thu, Aug 15, 2019 at 01:43:11PM +0100, Robin Murphy wrote: On 14/08/2019 18:56, Will Deacon wrote: Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict mode") added an unconditional call to io_pgtable_tlb_sync() immediately after the case where we replace a block entry with a table entry during an unmap() call. This is redundant, since the IOMMU API will call iommu_tlb_sync() on this path and the patch in question mentions this: | To save having to reason about it too much, make sure the invalidation | in arm_lpae_split_blk_unmap() just performs its own unconditional sync | to minimise the window in which we're technically violating the break- | before-make requirement on a live mapping. This might work out redundant | with an outer-level sync for strict unmaps, but we'll never be splitting | blocks on a DMA fastpath anyway. However, this sync gets in the way of deferred TLB invalidation for leaf entries and is at best a questionable, unproven hack. Remove it. Hey, that's my questionable, unproven hack! :P I thought you'd like to remain anonymous, but I can credit you if you like? ;) It's not entirely clear to me how this gets in the way though - AFAICS the intent of tlb_flush_leaf exactly matches the desired operation here, so couldn't these just wait to be converted in patch #8? Good point. I think there are two things: 1. Initially, I didn't plan to have tlb_flush_leaf() at all because I didn't think it would be needed. Then I ran into the v7s CONT stuff and ended up needing it after all (I think it's the only user). So that's an oversight. 2. If we do the tlb_flush_leaf() here, then we could potentially put a hole in the ongoing gather structure, but I suppose we could do both a tlb_add_page() *and* a tlb_flush_leaf() to get around that. So yes, I probably could move this back if the sync is necessary but... In principle the concern is that if the caller splits a block with iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync() before returning to the caller, and thus there's the potential to run into a TLB conflict on a subsequent access even if the endpoint was "good" and didn't make any accesses *during* the unmap call. ... this just feels pretty theoretical to me. The fact of the matter is that we're unable to do break before make because we can't reliably tolerate faults. If the hardware actually requires BBM for correctness, then we should probably explore proper solutions (e.g. quirks, avoiding block mappings, handling faults) rather than emitting a random sync and hoping for the best. Did you add the sync just in case, or was it based on a real crash? Nope, just a theoretical best-effort thing, which I'm certainly not going to lose sleep over either way - I just felt compelled to question the rationale which didn't seem to fit. Realistically, this partial-unmap case is not well-defined in IOMMU API terms, and other drivers don't handle it consistently. I think VFIO explicitly rejects partial unmaps, so if we see them at all it's only likely to be from GPU/SVA type users who in principle ought to be able to tolerate transient faults from BBM anyway. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: next take at setting up a dma mask by default for platform devices
On Thu, Aug 15, 2019 at 03:25:31PM +0200, Christoph Hellwig wrote: > On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote: > > I've taken the first 2 patches for 5.3-final. Given that patch 3 needs > > to be fixed, I'll wait for a respin of these before considering them. > > I have a respun version ready, but I'd really like to hear some > comments from usb developers about the approach before spamming > everyone again.. Spam away, we can take it :) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device
On Thu, Aug 15, 2019 at 03:38:12PM +0200, Christoph Hellwig wrote: > On Thu, Aug 15, 2019 at 03:03:25PM +0200, Greg Kroah-Hartman wrote: > > > --- a/include/linux/platform_device.h > > > +++ b/include/linux/platform_device.h > > > @@ -24,6 +24,7 @@ struct platform_device { > > > int id; > > > boolid_auto; > > > struct device dev; > > > + u64 dma_mask; > > > > Why is the dma_mask in 'struct device' which is part of this structure, > > not sufficient here? Shouldn't the "platform" be setting that up > > correctly already in the "archdata" type callback? > > Becaus the dma_mask in struct device is a pointer that needs to point > to something, and this is the best space we can allocate for 'something'. > m68k and powerpc currently do something roughly equivalent at the moment, > while everyone else just has horrible, horrible hacks. As mentioned in > the changelog the intent of this patch is that we treat platform devices > like any other bus, where the bus allocates the space for the dma_mask. > The long term plan is to eventually kill that weird pointer indirection > that doesn't help anyone, but for that we need to sort out the basics > first. Ah, missed that, sorry. Ok, no objection from me. Might as well respin this series and I can queue it up after 5.3-rc5 is out (which will have your first 2 patches in it.) thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
Hi Robin, On Thu, Aug 15, 2019 at 01:43:11PM +0100, Robin Murphy wrote: > On 14/08/2019 18:56, Will Deacon wrote: > > Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict > > mode") added an unconditional call to io_pgtable_tlb_sync() immediately > > after the case where we replace a block entry with a table entry during > > an unmap() call. This is redundant, since the IOMMU API will call > > iommu_tlb_sync() on this path and the patch in question mentions this: > > > > | To save having to reason about it too much, make sure the invalidation > > | in arm_lpae_split_blk_unmap() just performs its own unconditional sync > > | to minimise the window in which we're technically violating the break- > > | before-make requirement on a live mapping. This might work out redundant > > | with an outer-level sync for strict unmaps, but we'll never be splitting > > | blocks on a DMA fastpath anyway. > > > > However, this sync gets in the way of deferred TLB invalidation for leaf > > entries and is at best a questionable, unproven hack. Remove it. > > Hey, that's my questionable, unproven hack! :P I thought you'd like to remain anonymous, but I can credit you if you like? ;) > It's not entirely clear to me how this gets in the way though - AFAICS the > intent of tlb_flush_leaf exactly matches the desired operation here, so > couldn't these just wait to be converted in patch #8? Good point. I think there are two things: 1. Initially, I didn't plan to have tlb_flush_leaf() at all because I didn't think it would be needed. Then I ran into the v7s CONT stuff and ended up needing it after all (I think it's the only user). So that's an oversight. 2. If we do the tlb_flush_leaf() here, then we could potentially put a hole in the ongoing gather structure, but I suppose we could do both a tlb_add_page() *and* a tlb_flush_leaf() to get around that. So yes, I probably could move this back if the sync is necessary but... > In principle the concern is that if the caller splits a block with > iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync() > before returning to the caller, and thus there's the potential to run into a > TLB conflict on a subsequent access even if the endpoint was "good" and > didn't make any accesses *during* the unmap call. ... this just feels pretty theoretical to me. The fact of the matter is that we're unable to do break before make because we can't reliably tolerate faults. If the hardware actually requires BBM for correctness, then we should probably explore proper solutions (e.g. quirks, avoiding block mappings, handling faults) rather than emitting a random sync and hoping for the best. Did you add the sync just in case, or was it based on a real crash? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation
On Thu, Aug 15, 2019 at 12:19:58PM +0100, John Garry wrote: > On 14/08/2019 18:56, Will Deacon wrote: > > If you'd like to play with the patches, then I've also pushed them here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap > > > > but they should behave as a no-op on their own. > > As anticipated, my storage testing scenarios roughly give parity throughput > and CPU loading before and after this series. > > Patches to convert the > > Arm SMMUv3 driver to the new API are here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq > > I quickly tested this again and now I see a performance lift: > > before (5.3-rc1)after > D05 8x SAS disks 907K IOPS 970K IOPS > D05 1x NVMe 450K IOPS 466K IOPS > D06 1x NVMe 467K IOPS 466K IOPS > > The CPU loading seems to track throughput, so nothing much to say there. > > Note: From 5.2 testing, I was seeing >900K IOPS from that NVMe disk for > !IOMMU. Cheers, John. For interest, how do things look if you pass iommu.strict=0? That might give some indication about how much the invalidation is still hurting us. > BTW, what were your thoughts on changing > arm_smmu_atc_inv_domain()->arm_smmu_atc_inv_master() to batching? It seems > suitable, but looks untouched. Were you waiting for a resolution to the > performance issue which Leizhen reported? In principle, I'm supportive of such a change, but I'm not currently able to test any ATS stuff so somebody else would need to write the patch. Jean-Philippe is on holiday at the moment, but I'd be happy to review something from you if you send it out. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?
On 15/08/2019 14:35, Christoph Hellwig wrote: On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote: On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote: Hello Since lot of release (at least since 4.19), I hit the following error message: DMA-API: cacheline tracking ENOMEM, dma-debug disabled After hitting that, I try to check who is creating so many DMA mapping and see: cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c 6 ahci 257 e1000e 6 ehci-pci 5891 nouveau 24 uhci_hcd Does nouveau having this high number of DMA mapping is normal ? Yeah seems perfectly fine for a gpu. That is a lot and apparently overwhelm the dma-debug tracking. Robin rewrote this code in Linux 4.21 to work a little better, so I'm curious why this might have changes in 4.19, as dma-debug did not change at all there. FWIW, the cacheline tracking entries are a separate thing from the dma-debug entries that I rejigged - judging by those numbers there should still be plenty of free dma-debug entries, but for some reason it has failed to extend the radix tree :/ Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device
On Thu, Aug 15, 2019 at 03:03:25PM +0200, Greg Kroah-Hartman wrote: > > --- a/include/linux/platform_device.h > > +++ b/include/linux/platform_device.h > > @@ -24,6 +24,7 @@ struct platform_device { > > int id; > > boolid_auto; > > struct device dev; > > + u64 dma_mask; > > Why is the dma_mask in 'struct device' which is part of this structure, > not sufficient here? Shouldn't the "platform" be setting that up > correctly already in the "archdata" type callback? Becaus the dma_mask in struct device is a pointer that needs to point to something, and this is the best space we can allocate for 'something'. m68k and powerpc currently do something roughly equivalent at the moment, while everyone else just has horrible, horrible hacks. As mentioned in the changelog the intent of this patch is that we treat platform devices like any other bus, where the bus allocates the space for the dma_mask. The long term plan is to eventually kill that weird pointer indirection that doesn't help anyone, but for that we need to sort out the basics first. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?
On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote: > On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote: > > Hello > > > > Since lot of release (at least since 4.19), I hit the following error > > message: > > DMA-API: cacheline tracking ENOMEM, dma-debug disabled > > > > After hitting that, I try to check who is creating so many DMA mapping and > > see: > > cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c > > 6 ahci > > 257 e1000e > > 6 ehci-pci > >5891 nouveau > > 24 uhci_hcd > > > > Does nouveau having this high number of DMA mapping is normal ? > > Yeah seems perfectly fine for a gpu. That is a lot and apparently overwhelm the dma-debug tracking. Robin rewrote this code in Linux 4.21 to work a little better, so I'm curious why this might have changes in 4.19, as dma-debug did not change at all there. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device
On Wed, Aug 14, 2019 at 04:49:13PM +0100, Robin Murphy wrote: >> because we have to support platform_device structures that are >> statically allocated. > > This would be a good point to also get rid of the long-standing bodge in > platform_device_register_full(). platform_device_register_full looks odd to start with, especially as the coumentation is rather lacking.. >> +static void setup_pdev_archdata(struct platform_device *pdev) > > Bikeshed: painting the generic DMA API properties as "archdata" feels a bit > off-target :/ > >> +{ >> +if (!pdev->dev.coherent_dma_mask) >> +pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32); >> +if (!pdev->dma_mask) >> +pdev->dma_mask = DMA_BIT_MASK(32); >> +if (!pdev->dev.dma_mask) >> +pdev->dev.dma_mask = >dma_mask; >> +arch_setup_pdev_archdata(pdev); > > AFAICS m68k's implementation of that arch hook becomes entirely redundant > after this change, so may as well go. That would just leave powerpc's > actual archdata, which at a glance looks like it could probably be cleaned > up with not *too* much trouble. Actually I think we can just kill both off. At the point archdata is indeed entirely misnamed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: next take at setting up a dma mask by default for platform devices
On Thu, Aug 15, 2019 at 03:23:18PM +0200, Greg Kroah-Hartman wrote: > I've taken the first 2 patches for 5.3-final. Given that patch 3 needs > to be fixed, I'll wait for a respin of these before considering them. I have a respun version ready, but I'd really like to hear some comments from usb developers about the approach before spamming everyone again.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: next take at setting up a dma mask by default for platform devices
On Sun, Aug 11, 2019 at 10:05:14AM +0200, Christoph Hellwig wrote: > Hi all, > > this is another attempt to make sure the dma_mask pointer is always > initialized for platform devices. Not doing so lead to lots of > boilerplate code, and makes platform devices different from all our > major busses like PCI where we always set up a dma_mask. In the long > run this should also help to eventually make dma_mask a scalar value > instead of a pointer and remove even more cruft. > > The bigger blocker for this last time was the fact that the usb > subsystem uses the presence or lack of a dma_mask to check if the core > should do dma mapping for the driver, which is highly unusual. So we > fix this first. Note that this has some overlap with the pending > desire to use the proper dma_mmap_coherent helper for mapping usb > buffers. The first two patches from this series should probably > go into 5.3 and then uses as the basis for the decision to use > dma_mmap_coherent. I've taken the first 2 patches for 5.3-final. Given that patch 3 needs to be fixed, I'll wait for a respin of these before considering them. thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device
On Sun, Aug 11, 2019 at 10:05:20AM +0200, Christoph Hellwig wrote: > We still treat devices without a DMA mask as defaulting to 32-bits for > both mask, but a few releases ago we've started warning about such > cases, as they require special cases to work around this sloppyness. > Add a dma_mask field to struct platform_object so that we can initialize > the dma_mask pointer in struct device and initialize both masks to > 32-bits by default. Architectures can still override this in > arch_setup_pdev_archdata if needed. > > Note that the code looks a little odd with the various conditionals > because we have to support platform_device structures that are > statically allocated. > > Signed-off-by: Christoph Hellwig > --- > drivers/base/platform.c | 15 +-- > include/linux/platform_device.h | 1 + > 2 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index ec974ba9c0c4..b216fcb0a8af 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -264,6 +264,17 @@ struct platform_object { > char name[]; > }; > > +static void setup_pdev_archdata(struct platform_device *pdev) > +{ > + if (!pdev->dev.coherent_dma_mask) > + pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32); > + if (!pdev->dma_mask) > + pdev->dma_mask = DMA_BIT_MASK(32); > + if (!pdev->dev.dma_mask) > + pdev->dev.dma_mask = >dma_mask; > + arch_setup_pdev_archdata(pdev); > +}; > + > /** > * platform_device_put - destroy a platform device > * @pdev: platform device to free > @@ -310,7 +321,7 @@ struct platform_device *platform_device_alloc(const char > *name, int id) > pa->pdev.id = id; > device_initialize(>pdev.dev); > pa->pdev.dev.release = platform_device_release; > - arch_setup_pdev_archdata(>pdev); > + setup_pdev_archdata(>pdev); > } > > return pa ? >pdev : NULL; > @@ -512,7 +523,7 @@ EXPORT_SYMBOL_GPL(platform_device_del); > int platform_device_register(struct platform_device *pdev) > { > device_initialize(>dev); > - arch_setup_pdev_archdata(pdev); > + setup_pdev_archdata(pdev); > return platform_device_add(pdev); > } > EXPORT_SYMBOL_GPL(platform_device_register); > diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h > index 9bc36b589827..a2abde2aef25 100644 > --- a/include/linux/platform_device.h > +++ b/include/linux/platform_device.h > @@ -24,6 +24,7 @@ struct platform_device { > int id; > boolid_auto; > struct device dev; > + u64 dma_mask; Why is the dma_mask in 'struct device' which is part of this structure, not sufficient here? Shouldn't the "platform" be setting that up correctly already in the "archdata" type callback? confused, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
On 14/08/2019 18:56, Will Deacon wrote: Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict mode") added an unconditional call to io_pgtable_tlb_sync() immediately after the case where we replace a block entry with a table entry during an unmap() call. This is redundant, since the IOMMU API will call iommu_tlb_sync() on this path and the patch in question mentions this: | To save having to reason about it too much, make sure the invalidation | in arm_lpae_split_blk_unmap() just performs its own unconditional sync | to minimise the window in which we're technically violating the break- | before-make requirement on a live mapping. This might work out redundant | with an outer-level sync for strict unmaps, but we'll never be splitting | blocks on a DMA fastpath anyway. However, this sync gets in the way of deferred TLB invalidation for leaf entries and is at best a questionable, unproven hack. Remove it. Hey, that's my questionable, unproven hack! :P It's not entirely clear to me how this gets in the way though - AFAICS the intent of tlb_flush_leaf exactly matches the desired operation here, so couldn't these just wait to be converted in patch #8? In principle the concern is that if the caller splits a block with iommu_unmap_fast(), there's no guarantee of seeing an iommu_tlb_sync() before returning to the caller, and thus there's the potential to run into a TLB conflict on a subsequent access even if the endpoint was "good" and didn't make any accesses *during* the unmap call. Robin. Signed-off-by: Will Deacon --- drivers/iommu/io-pgtable-arm-v7s.c | 1 - drivers/iommu/io-pgtable-arm.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 0fc8dfab2abf..a62733c6a632 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -587,7 +587,6 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data, } io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - io_pgtable_tlb_sync(>iop); return size; } diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 161a7d56264d..0d6633921c1e 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -583,7 +583,6 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, tablep = iopte_deref(pte, data); } else if (unmap_idx >= 0) { io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - io_pgtable_tlb_sync(>iop); return size; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/2] iommu/iova: enhance the rcache optimization
v1 --> v2 1. I did not chagne the patches but added this cover-letter. 2. Add a batch of reviewers base on 9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation") 3. I described the problem I met in patch 2, but I hope below brief description can help people to quickly understand. Suppose there are six rcache sizes, each size can maximum hold 1 IOVAs. | 4K | 8K | 16K | 32K | 64K | 128K | | 1 | 9000 | 8500 | 8600 | 9200 | 7000 | As the above map displayed, the whole rcache buffered too many IOVAs. Now, the worst case can be coming, suppose we need 2 4K IOVAs at one time. That means 1 IOVAs can be allocated from rcache, but another 1 IOVAs should be allocated from RB tree base on alloc_iova() function. But the RB tree currently have at least (9000 + 8500 + 8600 + 9200 + 7000) = 42300 nodes. The average speed of RB tree traverse will be very slow. For my test scenario, the 4K size IOVAs are frequently used, but others are not. So similarly, when the 2 4K IOVAs are continuous freed, the first 1 IOVAs can be quickly buffered, but the other 1 IOVAs can not. Zhen Lei (2): iommu/iova: introduce iova_magazine_compact_pfns() iommu/iova: enhance the rcache optimization drivers/iommu/iova.c | 100 +++ include/linux/iova.h | 1 + 2 files changed, 95 insertions(+), 6 deletions(-) -- 1.8.3
[PATCH v2 1/2] iommu/iova: introduce iova_magazine_compact_pfns()
iova_magazine_free_pfns() can only free the whole magazine buffer, add iova_magazine_compact_pfns() to support free part of it. Signed-off-by: Zhen Lei --- drivers/iommu/iova.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 3e1a8a6755723a9..4b7a9efa0ef40af 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -795,18 +795,19 @@ static void iova_magazine_free(struct iova_magazine *mag) kfree(mag); } -static void -iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) +static void iova_magazine_compact_pfns(struct iova_magazine *mag, + struct iova_domain *iovad, + unsigned long newsize) { unsigned long flags; int i; - if (!mag) + if (!mag || mag->size <= newsize) return; spin_lock_irqsave(>iova_rbtree_lock, flags); - for (i = 0 ; i < mag->size; ++i) { + for (i = newsize; i < mag->size; ++i) { struct iova *iova = private_find_iova(iovad, mag->pfns[i]); BUG_ON(!iova); @@ -815,7 +816,13 @@ static void iova_magazine_free(struct iova_magazine *mag) spin_unlock_irqrestore(>iova_rbtree_lock, flags); - mag->size = 0; + mag->size = newsize; +} + +static void +iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) +{ + iova_magazine_compact_pfns(mag, iovad, 0); } static bool iova_magazine_full(struct iova_magazine *mag) -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] iommu/iova: enhance the rcache optimization
The rcache method caches the freed IOVAs, to improve the performance of IOVAs allocation and release. This is usually okay, but it maybe declined in some special scenarios. For example, currently the IOVA_RANGE_CACHE_MAX_SIZE is 6, and for ecch size, contains: MAX_GLOBAL_MAGS=32 shareable depot magazines, each vcpu has two magazines(cpu_rcaches->loaded and cpu_rcaches->prev). In an extreme case, it can max cache ((num_possible_cpus() * 2 + 32) * 128 * 6) IOVAs, it's very large. The worst case happens when the depot magazines of a certain size(usually 4K) is full, further free_iova_fast() invoking will cause iova_magazine_free_pfns() to be called. As the above saied, too many IOVAs buffered, so that the RB tree is very large, the iova_magazine_free_pfns()-->private_find_iova(), and the missed iova allocation: alloc_iova()-->__alloc_and_insert_iova_range() will spend too much time. And that the current rcache method have no cleanup operation, the buffered IOVAs will only increase but not decrease. For my FIO stress test scenario, the performance drop about 35%, and can not recover even if re-execute the test cases. Jobs: 21 (f=21): [2.3% done] [8887M/0K /s] [2170K/0 iops] Jobs: 21 (f=21): [2.3% done] [8902M/0K /s] [2173K/0 iops] Jobs: 21 (f=21): [2.3% done] [6010M/0K /s] [1467K/0 iops] Jobs: 21 (f=21): [2.3% done] [5397M/0K /s] [1318K/0 iops] So that, I add the statistic about the rcache, when the above case happened, release the IOVAs which are not hit. Jobs: 21 (f=21): [100.0% done] [10324M/0K /s] [2520K/0 iops] Jobs: 21 (f=21): [100.0% done] [10290M/0K /s] [2512K/0 iops] Jobs: 21 (f=21): [100.0% done] [10035M/0K /s] [2450K/0 iops] Jobs: 21 (f=21): [100.0% done] [10214M/0K /s] [2494K/0 iops] Signed-off-by: Zhen Lei --- drivers/iommu/iova.c | 83 +++- include/linux/iova.h | 1 + 2 files changed, 83 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 4b7a9efa0ef40af..f3828f4add25375 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -23,6 +23,8 @@ static unsigned long iova_rcache_get(struct iova_domain *iovad, unsigned long limit_pfn); static void init_iova_rcaches(struct iova_domain *iovad); static void free_iova_rcaches(struct iova_domain *iovad); +static void iova_compact_rcache(struct iova_domain *iovad, + struct iova_rcache *curr_rcache); static void fq_destroy_all_entries(struct iova_domain *iovad); static void fq_flush_timeout(struct timer_list *t); @@ -781,6 +783,8 @@ struct iova_magazine { struct iova_cpu_rcache { spinlock_t lock; + bool prev_mag_hit; + unsigned long nr_hit; struct iova_magazine *loaded; struct iova_magazine *prev; }; @@ -934,6 +938,7 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, if (mag_to_free) { iova_magazine_free_pfns(mag_to_free, iovad); iova_magazine_free(mag_to_free); + iova_compact_rcache(iovad, rcache); } return can_insert; @@ -971,18 +976,22 @@ static unsigned long __iova_rcache_get(struct iova_rcache *rcache, } else if (!iova_magazine_empty(cpu_rcache->prev)) { swap(cpu_rcache->prev, cpu_rcache->loaded); has_pfn = true; + cpu_rcache->prev_mag_hit = true; } else { spin_lock(>lock); if (rcache->depot_size > 0) { iova_magazine_free(cpu_rcache->loaded); cpu_rcache->loaded = rcache->depot[--rcache->depot_size]; has_pfn = true; + rcache->depot_mags_hit = true; } spin_unlock(>lock); } - if (has_pfn) + if (has_pfn) { + cpu_rcache->nr_hit++; iova_pfn = iova_magazine_pop(cpu_rcache->loaded, limit_pfn); + } spin_unlock_irqrestore(_rcache->lock, flags); @@ -1049,5 +1058,77 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) } } +static void iova_compact_percpu_mags(struct iova_domain *iovad, +struct iova_rcache *rcache) +{ + unsigned int cpu; + + for_each_possible_cpu(cpu) { + unsigned long flags; + struct iova_cpu_rcache *cpu_rcache; + + cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu); + + spin_lock_irqsave(_rcache->lock, flags); + if (!cpu_rcache->prev_mag_hit) + iova_magazine_free_pfns(cpu_rcache->prev, iovad); + + if (cpu_rcache->nr_hit < IOVA_MAG_SIZE) + iova_magazine_compact_pfns(cpu_rcache->loaded, + iovad, + cpu_rcache->nr_hit); + + cpu_rcache->nr_hit = 0; +
Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook
On 15/08/2019 11:56, Will Deacon wrote: On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote: Allocating and initialising a context for a domain is another point where certain implementations are known to want special behaviour. Currently the other half of the Cavium workaround comes into play here, so let's finish the job to get the whole thing right out of the way. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-impl.c | 39 +-- drivers/iommu/arm-smmu.c | 51 +++ drivers/iommu/arm-smmu.h | 42 +++-- 3 files changed, 86 insertions(+), 46 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c8904da08354..7a657d47b6ec 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = { }; +struct cavium_smmu { + struct arm_smmu_device smmu; + u32 id_base; +}; +#define to_csmmu(s)container_of(s, struct cavium_smmu, smmu) To be honest with you, I'd just use container_of directly for the two callsites that need it. "to_csmmu" isn't a great name when we're also got the calxeda thing in here. Sure, by this point I was mostly just going for completeness in terms of sketching out an example for subclassing arm_smmu_device. The Tegra patches will now serve as a more complete example anyway, so indeed we can live without it here. static int cavium_cfg_probe(struct arm_smmu_device *smmu) { static atomic_t context_count = ATOMIC_INIT(0); @@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu) * Ensure ASID and VMID allocation is unique across all SMMUs in * the system. */ - smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks, + to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks, _count); dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); return 0; } +int cavium_init_context(struct arm_smmu_domain *smmu_domain) +{ + u32 id_base = to_csmmu(smmu_domain->smmu)->id_base; + + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) + smmu_domain->cfg.vmid += id_base; + else + smmu_domain->cfg.asid += id_base; + + return 0; +} + const struct arm_smmu_impl cavium_impl = { .cfg_probe = cavium_cfg_probe, + .init_context = cavium_init_context, }; +struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) +{ + struct cavium_smmu *csmmu; + + csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL); + if (!csmmu) + return ERR_PTR(-ENOMEM); + + csmmu->smmu = *smmu; + csmmu->smmu.impl = _impl; + + devm_kfree(smmu->dev, smmu); + + return >smmu; +} + #define ARM_MMU500_ACTLR_CPRE (1 << 1) @@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) smmu->impl = _impl; if (smmu->model == CAVIUM_SMMUV2) - smmu->impl = _impl; + return cavium_smmu_impl_init(smmu); if (smmu->model == ARM_MMU500) smmu->impl = _mmu500_impl; Maybe rework this so we do the calxeda detection first (and return if we match), followed by a switch on smmu->model to make it crystal clear that we match only one? As I see it, "match only one" is really only a short-term thing, though, so I didn't want to get *too* hung up on it. Ultimately we're going to have cases where we need to combine e.g. MMU-500 implementation quirks with platform integration quirks - I've been mostly planning on coming back to think about that (and potentially rework this whole logic) later, but I guess it wouldn't hurt to plan out a bit more structure from the start. I'll have a hack on that (and all the other comments) today and hopefully have a v2 by tomorrow. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
Ok, I think speaking to Robin helped me a bit with this... On Thu, Aug 15, 2019 at 06:18:38PM +0800, Yong Wu wrote: > On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote: > > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote: > > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > > > > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > > > > is remapped to high address from 0x1__ to 0x1__, the > > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > > > > for all PTEs which means to enable bit32 of physical address. Here is > > > > > the detailed remap relationship in the "4GB mode": > > > > > CPU PA ->HW PA > > > > > 0x4000_ 0x1_4000_ (Add bit32) > > > > > 0x8000_ 0x1_8000_ ... > > > > > 0xc000_ 0x1_c000_ ... > > > > > 0x1__0x1__ (No change) [...] > > > > The way I would like this quirk to work is that the io-pgtable code > > > > basically sets bit 9 in the pte when bit 32 is set in the physical > > > > address, > > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It > > > > would then do the opposite when converting a pte to a physical address. > > > > > > > > That way, your driver can call the page table code directly with the > > > > high > > > > addresses and we don't have to do any manual offsetting or range > > > > checking > > > > in the page table code. > > > > > > In this case, the mt8183 can work successfully while the "4gb > > > mode"(mt8173/mt2712) can not. > > > > > > In the "4gb mode", As the remap relationship above, we should always add > > > bit32 in pte as we did in [2]. and need add bit32 in the > > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special > > > flow: > > > a. Always add bit32 in paddr_to_iopte. > > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. > > > > I think this is probably at the heart of my misunderstanding. What is so > > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM > > or something else? > > SRAM and HW register that IOMMU can not access. Ok, so redrawing your table from above, I think we can say something like: CPU Physical address 0G 1G 2G 3G 4G 5G |---A---|---B---|---C---|---D---|---E---| +--I/O--+Memory-+ IOMMU output physical address = 4G 5G 6G 7G 8G |---E---|---B---|---C---|---D---| +Memory-+ Do you agree? If so, what happens to region 'A' (the I/O region) in the IOMMU output physical address space. Is it accessible? Anyway, I think it's the job of the driver to convert between the two address spaces, so that: - On ->map(), bit 32 of the CPU physical address is set before calling into the iopgtable code - The result from ->iova_to_phys() should be the result from the iopgtable code, but with the top bit cleared for addresses over 5G. This assumes that: 1. We're ok setting bit 9 in the ptes mapping region 'E'. 2. The IOMMU page-table walker uses CPU physical addresses Are those true? Thanks, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/15] iommu/arm-smmu: Abstract context bank accesses
On 15/08/2019 11:56, Will Deacon wrote: On Fri, Aug 09, 2019 at 06:07:45PM +0100, Robin Murphy wrote: Context bank accesses are fiddly enough to deserve a number of extra helpers to keep the callsites looking sane, even though there are only one or two of each. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 137 --- 1 file changed, 72 insertions(+), 65 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 72505647b77d..abdcc3f52e2e 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -82,9 +82,6 @@ ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)\ ? 0x400 : 0)) -/* Translation context bank */ -#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << (smmu)->pgshift)) - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH 0x10 @@ -265,9 +262,29 @@ static void arm_smmu_writel(struct arm_smmu_device *smmu, int page, int offset, writel_relaxed(val, arm_smmu_page(smmu, page) + offset); } +static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset) +{ + return readq_relaxed(arm_smmu_page(smmu, page) + offset); +} + +static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int offset, + u64 val) +{ + writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); +} + #define arm_smmu_read_gr1(s, r) arm_smmu_readl((s), 1, (r)) #define arm_smmu_write_gr1(s, r, v) arm_smmu_writel((s), 1, (r), (v)) +#define arm_smmu_read_cb(s, n, r)\ + arm_smmu_readl((s), (s)->cb_base + (n), (r)) +#define arm_smmu_write_cb(s, n, r, v) \ + arm_smmu_writel((s), (s)->cb_base + (n), (r), (v)) +#define arm_smmu_read_cb_q(s, n, r)\ + arm_smmu_readq((s), (s)->cb_base + (n), (r)) +#define arm_smmu_write_cb_q(s, n, r, v)\ + arm_smmu_writeq((s), (s)->cb_base + (n), (r), (v)) 'r' for 'offset'? (maybe just rename offset => register in the helpers). I think this all represents the mangled remains of an underlying notion of 'register offset' ;) struct arm_smmu_option_prop { u32 opt; const char *prop; @@ -423,15 +440,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx) } /* Wait for any pending TLB invalidations to complete */ -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, - void __iomem *sync, void __iomem *status) +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) { unsigned int spin_cnt, delay; + u32 reg; - writel_relaxed(QCOM_DUMMY_VAL, sync); + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { - if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & sTLBGSTATUS_GSACTIVE)) return; cpu_relax(); } @@ -443,12 +462,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu) { - void __iomem *base = ARM_SMMU_GR0(smmu); unsigned long flags; spin_lock_irqsave(>global_sync_lock, flags); - __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC, - base + ARM_SMMU_GR0_sTLBGSTATUS); + __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC, Can we have a #define for page zero, please? Again, now I recall pondering the exact same thought, so clearly I don't have any grounds to object. I guess it's worth reworking the previous ARM_SMMU_{GR0,GR1,CB()} macros into the page number scheme rather than just killing them off - let me give that a try. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/15] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()
On 15/08/2019 11:56, Will Deacon wrote: On Fri, Aug 09, 2019 at 06:07:42PM +0100, Robin Murphy wrote: Since we now use separate iommu_gather_ops for stage 1 and stage 2 contexts, we may as well divide up the monolithic callback into its respective stage 1 and stage 2 parts. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 66 ++-- 1 file changed, 37 insertions(+), 29 deletions(-) This will conflict with my iommu API batching stuff, but I can sort that out if/when it gets queued by Joerg. - if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { - iova &= ~12UL; - iova |= cfg->asid; - do { - writel_relaxed(iova, reg); - iova += granule; - } while (size -= granule); - } else { - iova >>= 12; - iova |= (u64)cfg->asid << 48; - do { - writeq_relaxed(iova, reg); - iova += granule >> 12; - } while (size -= granule); - } - } else { - reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : - ARM_SMMU_CB_S2_TLBIIPAS2; - iova >>= 12; + if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { + iova &= ~12UL; Oh baby. You should move code around more often, so I'm forced to take a second look! Oh dear lord... The worst part is that I do now remember seeing this and having a similar moment of disbelief, but apparently I was easily distracted with rebasing and forgot about it too quickly :( Can you cook a fix for this that we can route separately, please? I see it also made its way into qcom_iommu.c... Sure, I'll split it out to the front of the series for the moment. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/13] Rework IOMMU API to allow for batching of invalidation
On 14/08/2019 18:56, Will Deacon wrote: Hi everybody, These are the core IOMMU changes that I have posted previously as part of my ongoing effort to reduce the lock contention of the SMMUv3 command queue. I thought it would be better to split this out as a separate series, since I think it's ready to go and all the driver conversions mean that it's quite a pain for me to maintain out of tree! The idea of the patch series is to allow TLB invalidation to be batched up into a new 'struct iommu_iotlb_gather' structure, which tracks the properties of the virtual address range being invalidated so that it can be deferred until the driver's ->iotlb_sync() function is called. This allows for more efficient invalidation on hardware that can submit multiple invalidations in one go. The previous series was included in: https://lkml.kernel.org/r/20190711171927.28803-1-w...@kernel.org The only real change since then is incorporating the newly merged virtio-iommu driver. If you'd like to play with the patches, then I've also pushed them here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap but they should behave as a no-op on their own. Hi Will, As anticipated, my storage testing scenarios roughly give parity throughput and CPU loading before and after this series. Patches to convert the Arm SMMUv3 driver to the new API are here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq I quickly tested this again and now I see a performance lift: before (5.3-rc1)after D05 8x SAS disks907K IOPS 970K IOPS D05 1x NVMe 450K IOPS 466K IOPS D06 1x NVMe 467K IOPS 466K IOPS The CPU loading seems to track throughput, so nothing much to say there. Note: From 5.2 testing, I was seeing >900K IOPS from that NVMe disk for !IOMMU. BTW, what were your thoughts on changing arm_smmu_atc_inv_domain()->arm_smmu_atc_inv_master() to batching? It seems suitable, but looks untouched. Were you waiting for a resolution to the performance issue which Leizhen reported? Thanks, John Cheers, Will --->8 Cc: Jean-Philippe Brucker Cc: Robin Murphy Cc: Jayachandran Chandrasekharan Nair Cc: Jan Glauber Cc: Jon Masters Cc: Eric Auger Cc: Zhen Lei Cc: Jonathan Cameron Cc: Vijay Kilary Cc: Joerg Roedel Cc: John Garry Cc: Alex Williamson Cc: Marek Szyprowski Cc: David Woodhouse Will Deacon (13): iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes iommu: Introduce iommu_iotlb_gather_add_page() iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf() iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() iommu/io-pgtable: Remove unused ->tlb_sync() callback iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() drivers/gpu/drm/panfrost/panfrost_mmu.c | 24 +--- drivers/iommu/amd_iommu.c | 11 ++-- drivers/iommu/arm-smmu-v3.c | 52 +++- drivers/iommu/arm-smmu.c| 103 drivers/iommu/dma-iommu.c | 9 ++- drivers/iommu/exynos-iommu.c| 3 +- drivers/iommu/intel-iommu.c | 3 +- drivers/iommu/io-pgtable-arm-v7s.c | 57 +- drivers/iommu/io-pgtable-arm.c | 48 --- drivers/iommu/iommu.c | 24 drivers/iommu/ipmmu-vmsa.c | 28 + drivers/iommu/msm_iommu.c | 42 + drivers/iommu/mtk_iommu.c | 45 +++--- drivers/iommu/mtk_iommu_v1.c| 3 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 44 +++--- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 3 +- drivers/iommu/tegra-gart.c | 12 +++- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c| 5 +- drivers/vfio/vfio_iommu_type1.c | 27 + include/linux/io-pgtable.h | 57 -- include/linux/iommu.h | 92 +--- 24 files changed, 483 insertions(+), 215 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org
Re: [PATCH 04/15] iommu/arm-smmu: Rework cb_base handling
On 14/08/2019 19:05, Will Deacon wrote: On Fri, Aug 09, 2019 at 06:07:41PM +0100, Robin Murphy wrote: To keep register-access quirks manageable, we want to structure things to avoid needing too many individual overrides. It seems fairly clean to have a single interface which handles both global and context registers in terms of the architectural pages, so the first preparatory step is to rework cb_base into a page number rather than an absolute address. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d9a93e5f422f..463bc8d98adb 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -95,7 +95,7 @@ #endif /* Translation context bank */ -#define ARM_SMMU_CB(smmu, n) ((smmu)->cb_base + ((n) << (smmu)->pgshift)) +#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << (smmu)->pgshift)) #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH 0x10 @@ -168,8 +168,8 @@ struct arm_smmu_device { struct device *dev; void __iomem *base; - void __iomem*cb_base; - unsigned long pgshift; + unsigned intcb_base; I think this is now a misnomer. Would you be able to rename it cb_pfn or something, please? Good point; in the architectural terms (section 8.1 of the spec), SMMU_CB_BASE is strictly a byte offset from SMMU_BASE, and the quantity we now have here is actually NUMPAGE. I've renamed it as such and tweaked the comments to be a bit more useful too. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api
Done, I just sent it there. I don't have any AMD hardware to test on while I'm traveling. However the rebase was very straightforward and the code was tested a month ago on the old linux-next. I only have the AMD conversion done. I will work on rebasing the intel one when I get a chance. On Tue, 13 Aug 2019 at 14:07, Christoph Hellwig wrote: > > On Tue, Aug 13, 2019 at 08:09:26PM +0800, Tom Murphy wrote: > > Hi Christoph, > > > > I quit my job and am having a great time traveling South East Asia. > > Enjoy! I just returned from my vacation. > > > I definitely don't want this work to go to waste and I hope to repost it > > later this week but I can't guarantee it. > > > > Let me know if you need this urgently. > > It isn't in any strict sense urgent. I just have various DMA API plans > that I'd rather just implement in dma-direct and dma-iommu rather than > also in two additional commonly used iommu drivers. So on the one had > the sooner the better, on the other hand no real urgency. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V5 5/5] iommu/amd: Convert AMD iommu driver to the dma-iommu api
Convert the AMD iommu driver to the dma-iommu api. Remove the iova handling and reserve region code from the AMD iommu driver. Signed-off-by: Tom Murphy --- drivers/iommu/Kconfig | 1 + drivers/iommu/amd_iommu.c | 677 -- 2 files changed, 68 insertions(+), 610 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e15cdcd8cb3c..437428571512 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -138,6 +138,7 @@ config AMD_IOMMU select PCI_PASID select IOMMU_API select IOMMU_IOVA + select IOMMU_DMA depends on X86_64 && PCI && ACPI ---help--- With this option you can enable support for AMD IOMMU hardware in diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 0e53f9bd2be7..eb4801031a99 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -89,8 +90,6 @@ const struct iommu_ops amd_iommu_ops; static ATOMIC_NOTIFIER_HEAD(ppr_notifier); int amd_iommu_max_glx_val = -1; -static const struct dma_map_ops amd_iommu_dma_ops; - /* * general struct to manage commands send to an IOMMU */ @@ -103,21 +102,6 @@ struct kmem_cache *amd_iommu_irq_cache; static void update_domain(struct protection_domain *domain); static int protection_domain_init(struct protection_domain *domain); static void detach_device(struct device *dev); -static void iova_domain_flush_tlb(struct iova_domain *iovad); - -/* - * Data container for a dma_ops specific protection domain - */ -struct dma_ops_domain { - /* generic protection domain information */ - struct protection_domain domain; - - /* IOVA RB-Tree */ - struct iova_domain iovad; -}; - -static struct iova_domain reserved_iova_ranges; -static struct lock_class_key reserved_rbtree_key; / * @@ -188,12 +172,6 @@ static struct protection_domain *to_pdomain(struct iommu_domain *dom) return container_of(dom, struct protection_domain, domain); } -static struct dma_ops_domain* to_dma_ops_domain(struct protection_domain *domain) -{ - BUG_ON(domain->flags != PD_DMA_OPS_MASK); - return container_of(domain, struct dma_ops_domain, domain); -} - static struct iommu_dev_data *alloc_dev_data(u16 devid) { struct iommu_dev_data *dev_data; @@ -1267,12 +1245,6 @@ static void domain_flush_pages(struct protection_domain *domain, __domain_flush_pages(domain, address, size, 0); } -/* Flush the whole IO/TLB for a given protection domain */ -static void domain_flush_tlb(struct protection_domain *domain) -{ - __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0); -} - /* Flush the whole IO/TLB for a given protection domain - including PDE */ static void domain_flush_tlb_pde(struct protection_domain *domain) { @@ -1674,43 +1646,6 @@ static unsigned long iommu_unmap_page(struct protection_domain *dom, return unmapped; } -/ - * - * The next functions belong to the address allocator for the dma_ops - * interface functions. - * - / - - -static unsigned long dma_ops_alloc_iova(struct device *dev, - struct dma_ops_domain *dma_dom, - unsigned int pages, u64 dma_mask) -{ - unsigned long pfn = 0; - - pages = __roundup_pow_of_two(pages); - - if (dma_mask > DMA_BIT_MASK(32)) - pfn = alloc_iova_fast(_dom->iovad, pages, - IOVA_PFN(DMA_BIT_MASK(32)), false); - - if (!pfn) - pfn = alloc_iova_fast(_dom->iovad, pages, - IOVA_PFN(dma_mask), true); - - return (pfn << PAGE_SHIFT); -} - -static void dma_ops_free_iova(struct dma_ops_domain *dma_dom, - unsigned long address, - unsigned int pages) -{ - pages = __roundup_pow_of_two(pages); - address >>= PAGE_SHIFT; - - free_iova_fast(_dom->iovad, address, pages); -} - / * * The next functions belong to the domain allocation. A domain is @@ -1787,38 +1722,23 @@ static void free_gcr3_table(struct protection_domain *domain) free_page((unsigned long)domain->gcr3_tbl); } -static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom) -{ - domain_flush_tlb(>domain); - domain_flush_complete(>domain); -} - -static void iova_domain_flush_tlb(struct iova_domain *iovad) -{ - struct dma_ops_domain *dom; - - dom = container_of(iovad, struct dma_ops_domain, iovad); - - dma_ops_domain_flush_tlb(dom); -}
[PATCH V5 3/5] iommu/dma-iommu: Handle deferred devices
Handle devices which defer their attach to the iommu in the dma-iommu api Signed-off-by: Tom Murphy --- drivers/iommu/dma-iommu.c | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 2712fbc68b28..906b7fa14d3c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -22,6 +22,7 @@ #include #include #include +#include struct iommu_dma_msi_page { struct list_headlist; @@ -351,6 +352,21 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, return iova_reserve_iommu_regions(dev, domain); } +static int handle_deferred_device(struct device *dev, + struct iommu_domain *domain) +{ + const struct iommu_ops *ops = domain->ops; + + if (!is_kdump_kernel()) + return 0; + + if (unlikely(ops->is_attach_deferred && + ops->is_attach_deferred(domain, dev))) + return iommu_attach_device(domain, dev); + + return 0; +} + /** * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API *page flags. @@ -463,6 +479,9 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size_t iova_off = iova_offset(iovad, phys); dma_addr_t iova; + if (unlikely(handle_deferred_device(dev, domain))) + return DMA_MAPPING_ERROR; + size = iova_align(iovad, size + iova_off); iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); @@ -581,6 +600,9 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size, *dma_handle = DMA_MAPPING_ERROR; + if (unlikely(handle_deferred_device(dev, domain))) + return NULL; + min_size = alloc_sizes & -alloc_sizes; if (min_size < PAGE_SIZE) { min_size = PAGE_SIZE; @@ -713,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, int prot = dma_info_to_prot(dir, coherent, attrs); dma_addr_t dma_handle; - dma_handle =__iommu_dma_map(dev, phys, size, prot); + dma_handle = __iommu_dma_map(dev, phys, size, prot); if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && dma_handle != DMA_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); @@ -823,6 +845,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, unsigned long mask = dma_get_seg_boundary(dev); int i; + if (unlikely(handle_deferred_device(dev, domain))) + return 0; + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) iommu_dma_sync_sg_for_device(dev, sg, nents, dir); -- 2.20.1
[PATCH V5 4/5] iommu/dma-iommu: Use the dev->coherent_dma_mask
Use the dev->coherent_dma_mask when allocating in the dma-iommu ops api. Signed-off-by: Tom Murphy --- drivers/iommu/dma-iommu.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 906b7fa14d3c..b9a3ab02434b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -471,7 +471,7 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, } static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, - size_t size, int prot) + size_t size, int prot, dma_addr_t dma_mask) { struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; @@ -484,7 +484,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size = iova_align(iovad, size + iova_off); - iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); + iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev); if (!iova) return DMA_MAPPING_ERROR; @@ -735,7 +735,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, int prot = dma_info_to_prot(dir, coherent, attrs); dma_addr_t dma_handle; - dma_handle = __iommu_dma_map(dev, phys, size, prot); + dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev)); if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && dma_handle != DMA_MAPPING_ERROR) arch_sync_dma_for_device(dev, phys, size, dir); @@ -938,7 +938,8 @@ static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs) { return __iommu_dma_map(dev, phys, size, - dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO); + dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, + dma_get_mask(dev)); } static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle, @@ -1041,7 +1042,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (!cpu_addr) return NULL; - *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot); + *handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot, + dev->coherent_dma_mask); if (*handle == DMA_MAPPING_ERROR) { __iommu_dma_free(dev, size, cpu_addr); return NULL; -- 2.20.1
[PATCH V5 2/5] iommu: Add gfp parameter to iommu_ops::map
Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver. The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 3 ++- drivers/iommu/arm-smmu-v3.c| 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 6 ++--- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c| 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c | 2 +- include/linux/iommu.h | 21 - 19 files changed, 77 insertions(+), 26 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 1948be7ac8f8..0e53f9bd2be7 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3030,7 +3030,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, } static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, -phys_addr_t paddr, size_t page_size, int iommu_prot) +phys_addr_t paddr, size_t page_size, int iommu_prot, +gfp_t gfp) { struct protection_domain *domain = to_pdomain(dom); int prot = 0; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index e7f49fd1a7ba..acc0eae7963f 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1975,7 +1975,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index aa06498f291d..05f42bdee494 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1284,7 +1284,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index d991d40f797f..2712fbc68b28 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -469,7 +469,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, if (!iova) return DMA_MAPPING_ERROR; - if (iommu_map(domain, iova, phys - iova_off, size, prot)) { + if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) { iommu_dma_free_iova(cookie, iova, size); return DMA_MAPPING_ERROR; } @@ -613,7 +613,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size, arch_dma_prep_coherent(sg_page(sg), sg->length); } - if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot) + if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot) < size) goto out_free_sg; @@ -873,7 +873,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, * We'll leave any physical concatenation to the IOMMU driver's * implementation - it knows better than we do. */ - if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) + if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len) goto out_free_iova; return __finalise_sg(dev, sg, nents, iova); diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 1934c16a5abc..b7dd46884692 100644 --- a/drivers/iommu/exynos-iommu.c +++
[PATCH V5 1/5] iommu/amd: Remove unnecessary locking from AMD iommu driver
We can remove the mutex lock from amd_iommu_map and amd_iommu_unmap. iommu_map doesn’t lock while mapping and so no two calls should touch the same iova range. The AMD driver already handles the page table page allocations without locks so we can safely remove the locks. Signed-off-by: Tom Murphy --- drivers/iommu/amd_iommu.c | 10 +- drivers/iommu/amd_iommu_types.h | 1 - 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 008da21a2592..1948be7ac8f8 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2858,7 +2858,6 @@ static void protection_domain_free(struct protection_domain *domain) static int protection_domain_init(struct protection_domain *domain) { spin_lock_init(>lock); - mutex_init(>api_lock); domain->id = domain_id_alloc(); if (!domain->id) return -ENOMEM; @@ -3045,9 +3044,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - mutex_lock(>api_lock); ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); - mutex_unlock(>api_lock); domain_flush_np_cache(domain, iova, page_size); @@ -3058,16 +3055,11 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, size_t page_size) { struct protection_domain *domain = to_pdomain(dom); - size_t unmap_size; if (domain->mode == PAGE_MODE_NONE) return 0; - mutex_lock(>api_lock); - unmap_size = iommu_unmap_page(domain, iova, page_size); - mutex_unlock(>api_lock); - - return unmap_size; + return iommu_unmap_page(domain, iova, page_size); } static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom, diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 9ac229e92b07..b764e1a73dcf 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -468,7 +468,6 @@ struct protection_domain { struct iommu_domain domain; /* generic domain handle used by iommu core code */ spinlock_t lock;/* mostly used to lock the page table*/ - struct mutex api_lock; /* protect page tables in the iommu-api path */ u16 id; /* the domain id written to the device table */ int mode; /* paging mode (0-6 levels) */ u64 *pt_root; /* page table root pointer */ -- 2.20.1
[PATCH V5 0/5] iommu/amd: Convert the AMD iommu driver to the dma-iommu api
Convert the AMD iommu driver to the dma-iommu api. Remove the iova handling and reserve region code from the AMD iommu driver. Change-log: V5: -Rebase on top of linux-next V4: -Rebase on top of linux-next -Split the removing of the unnecessary locking in the amd iommu driver into a seperate patch -refactor the "iommu/dma-iommu: Handle deferred devices" patch and address comments v3: -rename dma_limit to dma_mask -exit handle_deferred_device early if (!is_kdump_kernel()) -remove pointless calls to handle_deferred_device v2: -Rebase on top of this series: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.3 -Add a gfp_t parameter to the iommu_ops::map function. -Made use of the reserve region code inside the dma-iommu api Tom Murphy (5): iommu/amd: Remove unnecessary locking from AMD iommu driver iommu: Add gfp parameter to iommu_ops::map iommu/dma-iommu: Handle deferred devices iommu/dma-iommu: Use the dev->coherent_dma_mask iommu/amd: Convert AMD iommu driver to the dma-iommu api drivers/iommu/Kconfig | 1 + drivers/iommu/amd_iommu.c | 690 drivers/iommu/amd_iommu_types.h | 1 - drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/dma-iommu.c | 43 +- drivers/iommu/exynos-iommu.c| 2 +- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c| 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c| 2 +- include/linux/iommu.h | 21 +- 21 files changed, 178 insertions(+), 651 deletions(-) -- 2.20.1
Re: [PATCH 15/15] iommu/arm-smmu: Add context init implementation hook
On Fri, Aug 09, 2019 at 06:07:52PM +0100, Robin Murphy wrote: > Allocating and initialising a context for a domain is another point > where certain implementations are known to want special behaviour. > Currently the other half of the Cavium workaround comes into play here, > so let's finish the job to get the whole thing right out of the way. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu-impl.c | 39 +-- > drivers/iommu/arm-smmu.c | 51 +++ > drivers/iommu/arm-smmu.h | 42 +++-- > 3 files changed, 86 insertions(+), 46 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c > index c8904da08354..7a657d47b6ec 100644 > --- a/drivers/iommu/arm-smmu-impl.c > +++ b/drivers/iommu/arm-smmu-impl.c > @@ -48,6 +48,12 @@ const struct arm_smmu_impl calxeda_impl = { > }; > > > +struct cavium_smmu { > + struct arm_smmu_device smmu; > + u32 id_base; > +}; > +#define to_csmmu(s) container_of(s, struct cavium_smmu, smmu) To be honest with you, I'd just use container_of directly for the two callsites that need it. "to_csmmu" isn't a great name when we're also got the calxeda thing in here. > static int cavium_cfg_probe(struct arm_smmu_device *smmu) > { > static atomic_t context_count = ATOMIC_INIT(0); > @@ -56,17 +62,46 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu) >* Ensure ASID and VMID allocation is unique across all SMMUs in >* the system. >*/ > - smmu->cavium_id_base = atomic_fetch_add(smmu->num_context_banks, > + to_csmmu(smmu)->id_base = atomic_fetch_add(smmu->num_context_banks, > _count); > dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum > 27704\n"); > > return 0; > } > > +int cavium_init_context(struct arm_smmu_domain *smmu_domain) > +{ > + u32 id_base = to_csmmu(smmu_domain->smmu)->id_base; > + > + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) > + smmu_domain->cfg.vmid += id_base; > + else > + smmu_domain->cfg.asid += id_base; > + > + return 0; > +} > + > const struct arm_smmu_impl cavium_impl = { > .cfg_probe = cavium_cfg_probe, > + .init_context = cavium_init_context, > }; > > +struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) > +{ > + struct cavium_smmu *csmmu; > + > + csmmu = devm_kzalloc(smmu->dev, sizeof(*csmmu), GFP_KERNEL); > + if (!csmmu) > + return ERR_PTR(-ENOMEM); > + > + csmmu->smmu = *smmu; > + csmmu->smmu.impl = _impl; > + > + devm_kfree(smmu->dev, smmu); > + > + return >smmu; > +} > + > > #define ARM_MMU500_ACTLR_CPRE(1 << 1) > > @@ -121,7 +156,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct > arm_smmu_device *smmu) > smmu->impl = _impl; > > if (smmu->model == CAVIUM_SMMUV2) > - smmu->impl = _impl; > + return cavium_smmu_impl_init(smmu); > > if (smmu->model == ARM_MMU500) > smmu->impl = _mmu500_impl; Maybe rework this so we do the calxeda detection first (and return if we match), followed by a switch on smmu->model to make it crystal clear that we match only one? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 00/15] Arm SMMU refactoring
Hi Robin, On Fri, Aug 09, 2019 at 06:07:37PM +0100, Robin Murphy wrote: > This is a big refactoring of arm-smmu in order to help cope with the > various divergent implementation details currently flying around. So > far we've been accruing various quirks and errata workarounds within > the main flow of the driver, but given that it's written to an > architecture rather than any particular hardware implementation, after > a point these start to become increasingly invasive and potentially > conflict with each other. > > These patches clean up the existing quirks handled by the driver to > lay a foundation on which we can continue to add more in a maintainable > fashion. The idea is that major vendor customisations can then be kept > in arm-smmu-.c implementation files out of each others' way. > > A branch is available at: > > git://linux-arm.org/linux-rm iommu/smmu-impl > > which I'll probably keep tweaking until I'm happy with the names of > things; I just didn't want to delay this initial posting any lomnger. Thanks, this all looks pretty decent to me. I've mainly left you a bunch of nits (hey, it's a refactoring series!) but I did spot one pre-existing howler that we should address. When do you think you'll have stopped tweaking this so that I can pick it up? I'd really like to see it in 5.4 so that others can start working on top of it. Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/15] iommu/arm-smmu: Abstract context bank accesses
On Fri, Aug 09, 2019 at 06:07:45PM +0100, Robin Murphy wrote: > Context bank accesses are fiddly enough to deserve a number of extra > helpers to keep the callsites looking sane, even though there are only > one or two of each. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu.c | 137 --- > 1 file changed, 72 insertions(+), 65 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 72505647b77d..abdcc3f52e2e 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -82,9 +82,6 @@ > ((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS) \ > ? 0x400 : 0)) > > -/* Translation context bank */ > -#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << > (smmu)->pgshift)) > - > #define MSI_IOVA_BASE0x800 > #define MSI_IOVA_LENGTH 0x10 > > @@ -265,9 +262,29 @@ static void arm_smmu_writel(struct arm_smmu_device > *smmu, int page, int offset, > writel_relaxed(val, arm_smmu_page(smmu, page) + offset); > } > > +static u64 arm_smmu_readq(struct arm_smmu_device *smmu, int page, int offset) > +{ > + return readq_relaxed(arm_smmu_page(smmu, page) + offset); > +} > + > +static void arm_smmu_writeq(struct arm_smmu_device *smmu, int page, int > offset, > + u64 val) > +{ > + writeq_relaxed(val, arm_smmu_page(smmu, page) + offset); > +} > + > #define arm_smmu_read_gr1(s, r) arm_smmu_readl((s), 1, (r)) > #define arm_smmu_write_gr1(s, r, v) arm_smmu_writel((s), 1, (r), (v)) > > +#define arm_smmu_read_cb(s, n, r)\ > + arm_smmu_readl((s), (s)->cb_base + (n), (r)) > +#define arm_smmu_write_cb(s, n, r, v)\ > + arm_smmu_writel((s), (s)->cb_base + (n), (r), (v)) > +#define arm_smmu_read_cb_q(s, n, r) \ > + arm_smmu_readq((s), (s)->cb_base + (n), (r)) > +#define arm_smmu_write_cb_q(s, n, r, v) \ > + arm_smmu_writeq((s), (s)->cb_base + (n), (r), (v)) 'r' for 'offset'? (maybe just rename offset => register in the helpers). > struct arm_smmu_option_prop { > u32 opt; > const char *prop; > @@ -423,15 +440,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, > int idx) > } > > /* Wait for any pending TLB invalidations to complete */ > -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, > - void __iomem *sync, void __iomem *status) > +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, > + int sync, int status) > { > unsigned int spin_cnt, delay; > + u32 reg; > > - writel_relaxed(QCOM_DUMMY_VAL, sync); > + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); > for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { > for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { > - if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE)) > + reg = arm_smmu_readl(smmu, page, status); > + if (!(reg & sTLBGSTATUS_GSACTIVE)) > return; > cpu_relax(); > } > @@ -443,12 +462,11 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device > *smmu, > > static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu) > { > - void __iomem *base = ARM_SMMU_GR0(smmu); > unsigned long flags; > > spin_lock_irqsave(>global_sync_lock, flags); > - __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC, > - base + ARM_SMMU_GR0_sTLBGSTATUS); > + __arm_smmu_tlb_sync(smmu, 0, ARM_SMMU_GR0_sTLBGSYNC, Can we have a #define for page zero, please? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/15] iommu/arm-smmu: Split arm_smmu_tlb_inv_range_nosync()
On Fri, Aug 09, 2019 at 06:07:42PM +0100, Robin Murphy wrote: > Since we now use separate iommu_gather_ops for stage 1 and stage 2 > contexts, we may as well divide up the monolithic callback into its > respective stage 1 and stage 2 parts. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu.c | 66 ++-- > 1 file changed, 37 insertions(+), 29 deletions(-) This will conflict with my iommu API batching stuff, but I can sort that out if/when it gets queued by Joerg. > - if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { > - iova &= ~12UL; > - iova |= cfg->asid; > - do { > - writel_relaxed(iova, reg); > - iova += granule; > - } while (size -= granule); > - } else { > - iova >>= 12; > - iova |= (u64)cfg->asid << 48; > - do { > - writeq_relaxed(iova, reg); > - iova += granule >> 12; > - } while (size -= granule); > - } > - } else { > - reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L : > - ARM_SMMU_CB_S2_TLBIIPAS2; > - iova >>= 12; > + if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) { > + iova &= ~12UL; Oh baby. You should move code around more often, so I'm forced to take a second look! Can you cook a fix for this that we can route separately, please? I see it also made its way into qcom_iommu.c... Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP
On Thu, Aug 15, 2019 at 10:25:52AM +0100, James Bottomley wrote: > > which means exporting normally cachable memory to userspace is > > relatively dangrous due to cache aliasing. > > > > But normally cachable memory is only allocated by dma_alloc_coherent > > on parisc when using the sba_iommu or ccio_iommu drivers, so just > > remove the .mmap implementation for them so that we don't have to set > > ARCH_NO_COHERENT_DMA_MMAP, which I plan to get rid of. > > So I don't think this is quite right. We have three architectural > variants essentially (hidden behind about 12 cpu types): > >1. pa70xx: These can't turn off page caching, so they were the non > coherent problem case >2. pa71xx: These can manufacture coherent memory simply by turning off > the cache on a per page basis >3. pa8xxx: these have a full cache flush coherence mechanism. > > (I might have this slightly wrong: I vaguely remember the pa71xxlc > variants have some weird cache quirks for DMA as well) > > So I think pa70xx we can't mmap. pa71xx we can provided we mark the > page as uncached ... which should already have happened in the allocate > and pa8xxx which can always mmap dma memory without any special tricks. Except for the different naming scheme vs the code this matches my assumptions. In the code we have three cases (and a fourth EISA case mention in comments, but not actually implemented as far as I can tell): arch/parisc/kernel/pci-dma.c says in the top of file comments: ** AFAIK, all PA7100LC and PA7300LC platforms can use this code. and the handles two different case. for cpu_type == pcxl or pcxl2 it maps the memory as uncached for dma_alloc_coherent, and for all other cpu types it fails the coherent allocations. In addition to that there are the ccio and sba iommu drivers, of which according to your above comment one is always present for pa8xxx. Which brings us back to this patch, which ensures that no cacheable memory is exported to userspace by removing ->mmap from ccio and sba. It then enabled dma_mmap_coherent for the pcxl or pcxl2 case that allocates uncached memory, which dma_mmap_coherent does not work because dma_alloc_coherent already failed for the !pcxl && !pcxl2 and thus there is no memory to mmap. So if the description is too confusing please suggest a better one, I'm a little lost between all these code names and product names (arch/parisc/include/asm/dma-mapping.h uses yet another set).
Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver
On Thu, Aug 15, 2019 at 07:29:59PM +0900, Tomasz Figa wrote: > On Thu, Aug 15, 2019 at 5:25 PM Sakari Ailus > wrote: > > > > Hi Helen, > > > > On Wed, Aug 14, 2019 at 09:58:05PM -0300, Helen Koike wrote: > > > > ... > > > > > >> +static int rkisp1_isp_sd_set_fmt(struct v4l2_subdev *sd, > > > >> + struct v4l2_subdev_pad_config *cfg, > > > >> + struct v4l2_subdev_format *fmt) > > > >> +{ > > > >> + struct rkisp1_device *isp_dev = sd_to_isp_dev(sd); > > > >> + struct rkisp1_isp_subdev *isp_sd = _dev->isp_sdev; > > > >> + struct v4l2_mbus_framefmt *mf = >format; > > > >> + > > > > > > > > Note that for sub-device nodes, the driver is itself responsible for > > > > serialising the access to its data structures. > > > > > > But looking at subdev_do_ioctl_lock(), it seems that it serializes the > > > ioctl calls for subdevs, no? Or I'm misunderstanding something (which is > > > most probably) ? > > > > Good question. I had missed this change --- subdev_do_ioctl_lock() is > > relatively new. But setting that lock is still not possible as the struct > > is allocated in the framework and the device is registered before the > > driver gets hold of it. It's a good idea to provide the same serialisation > > for subdevs as well. > > > > I'll get back to this later. > > > > ... > > > > > >> +static int rkisp1_isp_sd_s_power(struct v4l2_subdev *sd, int on) > > > > > > > > If you support runtime PM, you shouldn't implement the s_power op. > > > > > > Is is ok to completly remove the usage of runtime PM then? > > > Like this http://ix.io/1RJb ? > > > > Please use runtime PM instead. In the long run we should get rid of the > > s_power op. Drivers themselves know better when the hardware they control > > should be powered on or off. > > > > One also needs to use runtime PM to handle power domains and power > dependencies on auxiliary devices, e.g. IOMMU. > > > > > > > tbh I'm not that familar with runtime PM and I'm not sure what is the > > > difference of it and using s_power op (and > > > Documentation/power/runtime_pm.rst > > > is not being that helpful tbh). > > > > You can find a simple example e.g. in > > drivers/media/platform/atmel/atmel-isi.c . > > > > > > > > > > > > > You'll still need to call s_power on external subdevs though. > > > > > > > >> +{ > > > >> + struct rkisp1_device *isp_dev = sd_to_isp_dev(sd); > > > >> + int ret; > > > >> + > > > >> + v4l2_dbg(1, rkisp1_debug, _dev->v4l2_dev, "s_power: %d\n", on); > > > >> + > > > >> + if (on) { > > > >> + ret = pm_runtime_get_sync(isp_dev->dev); > > > > > > If this is not ok to remove suport for runtime PM, then where should I put > > > the call to pm_runtime_get_sync() if not in this s_power op ? > > > > Basically the runtime_resume and runtime_suspend callbacks are where the > > device power state changes are implemented, and pm_runtime_get_sync and > > pm_runtime_put are how the driver controls the power state. > > > > So you no longer need the s_power() op at all. The op needs to be called on > > the pipeline however, as there are drivers that still use it. > > > > For this driver, I suppose we would _get_sync() when we start > streaming (in the hardware, i.e. we want the ISP to start capturing > frames) and _put() when we stop and the driver shouldn't perform any > access to the hardware when the streaming is not active. Agreed. -- Sakari Ailus sakari.ai...@linux.intel.com
Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver
On Thu, Aug 15, 2019 at 5:25 PM Sakari Ailus wrote: > > Hi Helen, > > On Wed, Aug 14, 2019 at 09:58:05PM -0300, Helen Koike wrote: > > ... > > > >> +static int rkisp1_isp_sd_set_fmt(struct v4l2_subdev *sd, > > >> + struct v4l2_subdev_pad_config *cfg, > > >> + struct v4l2_subdev_format *fmt) > > >> +{ > > >> + struct rkisp1_device *isp_dev = sd_to_isp_dev(sd); > > >> + struct rkisp1_isp_subdev *isp_sd = _dev->isp_sdev; > > >> + struct v4l2_mbus_framefmt *mf = >format; > > >> + > > > > > > Note that for sub-device nodes, the driver is itself responsible for > > > serialising the access to its data structures. > > > > But looking at subdev_do_ioctl_lock(), it seems that it serializes the > > ioctl calls for subdevs, no? Or I'm misunderstanding something (which is > > most probably) ? > > Good question. I had missed this change --- subdev_do_ioctl_lock() is > relatively new. But setting that lock is still not possible as the struct > is allocated in the framework and the device is registered before the > driver gets hold of it. It's a good idea to provide the same serialisation > for subdevs as well. > > I'll get back to this later. > > ... > > > >> +static int rkisp1_isp_sd_s_power(struct v4l2_subdev *sd, int on) > > > > > > If you support runtime PM, you shouldn't implement the s_power op. > > > > Is is ok to completly remove the usage of runtime PM then? > > Like this http://ix.io/1RJb ? > > Please use runtime PM instead. In the long run we should get rid of the > s_power op. Drivers themselves know better when the hardware they control > should be powered on or off. > One also needs to use runtime PM to handle power domains and power dependencies on auxiliary devices, e.g. IOMMU. > > > > tbh I'm not that familar with runtime PM and I'm not sure what is the > > difference of it and using s_power op (and > > Documentation/power/runtime_pm.rst > > is not being that helpful tbh). > > You can find a simple example e.g. in > drivers/media/platform/atmel/atmel-isi.c . > > > > > > > > > You'll still need to call s_power on external subdevs though. > > > > > >> +{ > > >> + struct rkisp1_device *isp_dev = sd_to_isp_dev(sd); > > >> + int ret; > > >> + > > >> + v4l2_dbg(1, rkisp1_debug, _dev->v4l2_dev, "s_power: %d\n", on); > > >> + > > >> + if (on) { > > >> + ret = pm_runtime_get_sync(isp_dev->dev); > > > > If this is not ok to remove suport for runtime PM, then where should I put > > the call to pm_runtime_get_sync() if not in this s_power op ? > > Basically the runtime_resume and runtime_suspend callbacks are where the > device power state changes are implemented, and pm_runtime_get_sync and > pm_runtime_put are how the driver controls the power state. > > So you no longer need the s_power() op at all. The op needs to be called on > the pipeline however, as there are drivers that still use it. > For this driver, I suppose we would _get_sync() when we start streaming (in the hardware, i.e. we want the ISP to start capturing frames) and _put() when we stop and the driver shouldn't perform any access to the hardware when the streaming is not active. Best regards, Tomasz
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote: > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote: > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > > > is remapped to high address from 0x1__ to 0x1__, the > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > > > for all PTEs which means to enable bit32 of physical address. Here is > > > > the detailed remap relationship in the "4GB mode": > > > > CPU PA ->HW PA > > > > 0x4000_ 0x1_4000_ (Add bit32) > > > > 0x8000_ 0x1_8000_ ... > > > > 0xc000_ 0x1_c000_ ... > > > > 0x1__0x1__ (No change) > > > > > > So in this example, there are no PAs below 0x4000_ yet you later > > > add code to deal with that: > > > > > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < > > > > 0x4000_.*/ > > > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > > > > + paddr |= BIT_ULL(32); > > > > > > Why? Mainline currently doesn't do anything like this for the "4gb mode" > > > support as far as I can tell. In fact, we currently unconditionally set > > > bit 32 in the physical address returned by iova_to_phys() which wouldn't > > > match your CPU PAs listed above, so I'm confused about how this is > > > supposed > > > to work. > > > > Actually current mainline have a bug for this. So I tried to use another > > special patch[1] for it in v8. > > If you're fixing a bug in mainline, I'd prefer to see that as a separate > patch. > > > But the issue is not critical since MediaTek multimedia consumer(v4l2 > > and drm) don't call iommu_iova_to_phys currently. > > > > > > > > The way I would like this quirk to work is that the io-pgtable code > > > basically sets bit 9 in the pte when bit 32 is set in the physical > > > address, > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It > > > would then do the opposite when converting a pte to a physical address. > > > > > > That way, your driver can call the page table code directly with the high > > > addresses and we don't have to do any manual offsetting or range checking > > > in the page table code. > > > > In this case, the mt8183 can work successfully while the "4gb > > mode"(mt8173/mt2712) can not. > > > > In the "4gb mode", As the remap relationship above, we should always add > > bit32 in pte as we did in [2]. and need add bit32 in the > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special > > flow: > > a. Always add bit32 in paddr_to_iopte. > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. > > I think this is probably at the heart of my misunderstanding. What is so > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM > or something else? SRAM and HW register that IOMMU can not access. (sorry, My mailbox has something wrong.)
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
On Thu, Aug 15, 2019 at 06:03:30PM +0800, Yong Wu wrote: > On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote: > > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote: > > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > > > > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > > > > is remapped to high address from 0x1__ to 0x1__, the > > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > > > > for all PTEs which means to enable bit32 of physical address. Here is > > > > > the detailed remap relationship in the "4GB mode": > > > > > CPU PA ->HW PA > > > > > 0x4000_ 0x1_4000_ (Add bit32) > > > > > 0x8000_ 0x1_8000_ ... > > > > > 0xc000_ 0x1_c000_ ... > > > > > 0x1__0x1__ (No change) > > > > > > > > So in this example, there are no PAs below 0x4000_ yet you later > > > > add code to deal with that: > > > > > > > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < > > > > > 0x4000_.*/ > > > > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > > > > > + paddr |= BIT_ULL(32); > > > > > > > > Why? Mainline currently doesn't do anything like this for the "4gb mode" > > > > support as far as I can tell. In fact, we currently unconditionally set > > > > bit 32 in the physical address returned by iova_to_phys() which wouldn't > > > > match your CPU PAs listed above, so I'm confused about how this is > > > > supposed > > > > to work. > > > > > > Actually current mainline have a bug for this. So I tried to use another > > > special patch[1] for it in v8. > > > > If you're fixing a bug in mainline, I'd prefer to see that as a separate > > patch. > > > > > But the issue is not critical since MediaTek multimedia consumer(v4l2 > > > and drm) don't call iommu_iova_to_phys currently. > > > > > > > > > > > The way I would like this quirk to work is that the io-pgtable code > > > > basically sets bit 9 in the pte when bit 32 is set in the physical > > > > address, > > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It > > > > would then do the opposite when converting a pte to a physical address. > > > > > > > > That way, your driver can call the page table code directly with the > > > > high > > > > addresses and we don't have to do any manual offsetting or range > > > > checking > > > > in the page table code. > > > > > > In this case, the mt8183 can work successfully while the "4gb > > > mode"(mt8173/mt2712) can not. > > > > > > In the "4gb mode", As the remap relationship above, we should always add > > > bit32 in pte as we did in [2]. and need add bit32 in the > > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special > > > flow: > > > a. Always add bit32 in paddr_to_iopte. > > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. > > > > I think this is probably at the heart of my misunderstanding. What is so > > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM > > or something else? > > SRAM and the HW registers. Do we actually need to be able to map those in the IOMMU? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
On Thu, 2019-08-15 at 10:51 +0100, Will Deacon wrote: > On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote: > > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > > > is remapped to high address from 0x1__ to 0x1__, the > > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > > > for all PTEs which means to enable bit32 of physical address. Here is > > > > the detailed remap relationship in the "4GB mode": > > > > CPU PA ->HW PA > > > > 0x4000_ 0x1_4000_ (Add bit32) > > > > 0x8000_ 0x1_8000_ ... > > > > 0xc000_ 0x1_c000_ ... > > > > 0x1__0x1__ (No change) > > > > > > So in this example, there are no PAs below 0x4000_ yet you later > > > add code to deal with that: > > > > > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < > > > > 0x4000_.*/ > > > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > > > > + paddr |= BIT_ULL(32); > > > > > > Why? Mainline currently doesn't do anything like this for the "4gb mode" > > > support as far as I can tell. In fact, we currently unconditionally set > > > bit 32 in the physical address returned by iova_to_phys() which wouldn't > > > match your CPU PAs listed above, so I'm confused about how this is > > > supposed > > > to work. > > > > Actually current mainline have a bug for this. So I tried to use another > > special patch[1] for it in v8. > > If you're fixing a bug in mainline, I'd prefer to see that as a separate > patch. > > > But the issue is not critical since MediaTek multimedia consumer(v4l2 > > and drm) don't call iommu_iova_to_phys currently. > > > > > > > > The way I would like this quirk to work is that the io-pgtable code > > > basically sets bit 9 in the pte when bit 32 is set in the physical > > > address, > > > and sets bit 4 in the pte when bit 33 is set in the physical address. It > > > would then do the opposite when converting a pte to a physical address. > > > > > > That way, your driver can call the page table code directly with the high > > > addresses and we don't have to do any manual offsetting or range checking > > > in the page table code. > > > > In this case, the mt8183 can work successfully while the "4gb > > mode"(mt8173/mt2712) can not. > > > > In the "4gb mode", As the remap relationship above, we should always add > > bit32 in pte as we did in [2]. and need add bit32 in the > > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special > > flow: > > a. Always add bit32 in paddr_to_iopte. > > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. > > I think this is probably at the heart of my misunderstanding. What is so > special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM > or something else? SRAM and the HW registers. > > > > Please can you explain to me why the diff below doesn't work on top of > > > this series? > > > > The diff below is just I did in v8[3]. The different is that I move the > > "4gb mode" special flow in the mtk_iommu.c in v8, the code is like > > [4]below. When I sent v9, I found that I can distinguish the "4gb mode" > > with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5] > > based on your diff. > > > > > > > I'm happy to chat on IRC if you think it would be easier, > > > because I have a horrible feeling that we've been talking past each other > > > and I'd like to see this support merged for 5.4. > > > > Thanks very much for your view, I'm sorry that I don't have IRC. I will > > send the next version quickly if we have a conclusion here. Then Which > > way is better? If you'd like keep the pagetable code clean, I will add > > the "4gb mode" special flow into mtk_iommu.c. > > I mean, we could even talk on the phone if necessary because I can't accept > this code unless I understand how it works! > > To be blunt, I'd like to avoid the io-pgtable changes looking different to > what I suggested: > > > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > > > b/drivers/iommu/io-pgtable-arm-v7s.c > > > index ab12ef5f8b03..d8d84617c822 100644 > > > --- a/drivers/iommu/io-pgtable-arm-v7s.c > > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > > > @@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t > > > paddr, int lvl, > > > arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); > > > > > > if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) { > > > - if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS) > > > + if (paddr & BIT_ULL(32)) > > > pte |= ARM_V7S_ATTR_MTK_PA_BIT32; > > >
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
On Thu, Aug 15, 2019 at 04:47:49PM +0800, Yong Wu wrote: > On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > > is remapped to high address from 0x1__ to 0x1__, the > > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > > for all PTEs which means to enable bit32 of physical address. Here is > > > the detailed remap relationship in the "4GB mode": > > > CPU PA ->HW PA > > > 0x4000_ 0x1_4000_ (Add bit32) > > > 0x8000_ 0x1_8000_ ... > > > 0xc000_ 0x1_c000_ ... > > > 0x1__0x1__ (No change) > > > > So in this example, there are no PAs below 0x4000_ yet you later > > add code to deal with that: > > > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/ > > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > > > + paddr |= BIT_ULL(32); > > > > Why? Mainline currently doesn't do anything like this for the "4gb mode" > > support as far as I can tell. In fact, we currently unconditionally set > > bit 32 in the physical address returned by iova_to_phys() which wouldn't > > match your CPU PAs listed above, so I'm confused about how this is supposed > > to work. > > Actually current mainline have a bug for this. So I tried to use another > special patch[1] for it in v8. If you're fixing a bug in mainline, I'd prefer to see that as a separate patch. > But the issue is not critical since MediaTek multimedia consumer(v4l2 > and drm) don't call iommu_iova_to_phys currently. > > > > > The way I would like this quirk to work is that the io-pgtable code > > basically sets bit 9 in the pte when bit 32 is set in the physical address, > > and sets bit 4 in the pte when bit 33 is set in the physical address. It > > would then do the opposite when converting a pte to a physical address. > > > > That way, your driver can call the page table code directly with the high > > addresses and we don't have to do any manual offsetting or range checking > > in the page table code. > > In this case, the mt8183 can work successfully while the "4gb > mode"(mt8173/mt2712) can not. > > In the "4gb mode", As the remap relationship above, we should always add > bit32 in pte as we did in [2]. and need add bit32 in the > "iova_to_phys"(Not always add.). That means the "4gb mode" has a special > flow: > a. Always add bit32 in paddr_to_iopte. > b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. I think this is probably at the heart of my misunderstanding. What is so special about PAs (is this HW PA or CPU PA?) below 0x4000? Is this RAM or something else? > > Please can you explain to me why the diff below doesn't work on top of > > this series? > > The diff below is just I did in v8[3]. The different is that I move the > "4gb mode" special flow in the mtk_iommu.c in v8, the code is like > [4]below. When I sent v9, I found that I can distinguish the "4gb mode" > with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5] > based on your diff. > > > > I'm happy to chat on IRC if you think it would be easier, > > because I have a horrible feeling that we've been talking past each other > > and I'd like to see this support merged for 5.4. > > Thanks very much for your view, I'm sorry that I don't have IRC. I will > send the next version quickly if we have a conclusion here. Then Which > way is better? If you'd like keep the pagetable code clean, I will add > the "4gb mode" special flow into mtk_iommu.c. I mean, we could even talk on the phone if necessary because I can't accept this code unless I understand how it works! To be blunt, I'd like to avoid the io-pgtable changes looking different to what I suggested: > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > > b/drivers/iommu/io-pgtable-arm-v7s.c > > index ab12ef5f8b03..d8d84617c822 100644 > > --- a/drivers/iommu/io-pgtable-arm-v7s.c > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > > @@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, > > int lvl, > > arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); > > > > if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) { > > - if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS) > > + if (paddr & BIT_ULL(32)) > > pte |= ARM_V7S_ATTR_MTK_PA_BIT32; > > if (paddr & BIT_ULL(33)) > > pte |= ARM_V7S_ATTR_MTK_PA_BIT33; > > @@ -206,17 +206,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, > > int lvl, > > mask = ARM_V7S_LVL_MASK(lvl); > > > > paddr = pte & mask; > > - if (cfg->oas == 32 || !(cfg->quirks &
Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP
On Thu, 2019-08-08 at 19:00 +0300, Christoph Hellwig wrote: > parisc is the only architecture that sets ARCH_NO_COHERENT_DMA_MMAP > when an MMU is enabled. AFAIK this is because parisc CPUs use VIVT > caches, We're actually VIPT but the same principle applies. > which means exporting normally cachable memory to userspace is > relatively dangrous due to cache aliasing. > > But normally cachable memory is only allocated by dma_alloc_coherent > on parisc when using the sba_iommu or ccio_iommu drivers, so just > remove the .mmap implementation for them so that we don't have to set > ARCH_NO_COHERENT_DMA_MMAP, which I plan to get rid of. So I don't think this is quite right. We have three architectural variants essentially (hidden behind about 12 cpu types): 1. pa70xx: These can't turn off page caching, so they were the non coherent problem case 2. pa71xx: These can manufacture coherent memory simply by turning off the cache on a per page basis 3. pa8xxx: these have a full cache flush coherence mechanism. (I might have this slightly wrong: I vaguely remember the pa71xxlc variants have some weird cache quirks for DMA as well) So I think pa70xx we can't mmap. pa71xx we can provided we mark the page as uncached ... which should already have happened in the allocate and pa8xxx which can always mmap dma memory without any special tricks. James ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
On Wed, 2019-08-14 at 15:41 +0100, Will Deacon wrote: > Hi Yong Wu, > > Sorry, but I'm still deeply confused by this patch. Sorry for this. the "4GB mode" really is a bit odd... > > On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > > is remapped to high address from 0x1__ to 0x1__, the > > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > > for all PTEs which means to enable bit32 of physical address. Here is > > the detailed remap relationship in the "4GB mode": > > CPU PA ->HW PA > > 0x4000_ 0x1_4000_ (Add bit32) > > 0x8000_ 0x1_8000_ ... > > 0xc000_ 0x1_c000_ ... > > 0x1__0x1__ (No change) > > So in this example, there are no PAs below 0x4000_ yet you later > add code to deal with that: > > > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/ > > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > > + paddr |= BIT_ULL(32); > > Why? Mainline currently doesn't do anything like this for the "4gb mode" > support as far as I can tell. In fact, we currently unconditionally set > bit 32 in the physical address returned by iova_to_phys() which wouldn't > match your CPU PAs listed above, so I'm confused about how this is supposed > to work. Actually current mainline have a bug for this. So I tried to use another special patch[1] for it in v8. But the issue is not critical since MediaTek multimedia consumer(v4l2 and drm) don't call iommu_iova_to_phys currently. > > The way I would like this quirk to work is that the io-pgtable code > basically sets bit 9 in the pte when bit 32 is set in the physical address, > and sets bit 4 in the pte when bit 33 is set in the physical address. It > would then do the opposite when converting a pte to a physical address. > > That way, your driver can call the page table code directly with the high > addresses and we don't have to do any manual offsetting or range checking > in the page table code. In this case, the mt8183 can work successfully while the "4gb mode"(mt8173/mt2712) can not. In the "4gb mode", As the remap relationship above, we should always add bit32 in pte as we did in [2]. and need add bit32 in the "iova_to_phys"(Not always add.). That means the "4gb mode" has a special flow: a. Always add bit32 in paddr_to_iopte. b. Add bit32 only when PA < 0x4000 in iopte_to_paddr. > > Please can you explain to me why the diff below doesn't work on top of > this series? The diff below is just I did in v8[3]. The different is that I move the "4gb mode" special flow in the mtk_iommu.c in v8, the code is like [4]below. When I sent v9, I found that I can distinguish the "4gb mode" with "oas == 33" in v7s. then I can "simply" add the 4gb special flow[5] based on your diff. > I'm happy to chat on IRC if you think it would be easier, > because I have a horrible feeling that we've been talking past each other > and I'd like to see this support merged for 5.4. Thanks very much for your view, I'm sorry that I don't have IRC. I will send the next version quickly if we have a conclusion here. Then Which way is better? If you'd like keep the pagetable code clean, I will add the "4gb mode" special flow into mtk_iommu.c. Thanks. [1]http://lists.infradead.org/pipermail/linux-mediatek/2019-June/020988.html [2] https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/iommu/io-pgtable-arm-v7s.c#L299 [3]http://lists.infradead.org/pipermail/linux-mediatek/2019-June/020991.html [4]==4gb mode special flow in mtk_iommu.c== +#define MTK_IOMMU_4GB_MODE_REMAP_BASE 0x14000UL @@ -380,12 +379,16 @@ static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct mtk_iommu_domain *dom = to_mtk_domain(domain); + struct mtk_iommu_data *data = mtk_iommu_get_m4u_data(); unsigned long flags; int ret; + /* The "4GB mode" M4U physically can not use the lower remap of Dram. */ + if (data->enable_4GB) + paddr |= BIT_ULL(32); + spin_lock_irqsave(>pgtlock, flags); - ret = dom->iop->map(dom->iop, iova, paddr & DMA_BIT_MASK(32), - size, prot); + ret = dom->iop->map(dom->iop, iova, paddr, size, prot); spin_unlock_irqrestore(>pgtlock, flags); return ret; @@ -422,8 +425,8 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain, pa = dom->iop->iova_to_phys(dom->iop, iova); spin_unlock_irqrestore(>pgtlock, flags); - if (data->enable_4GB && pa < MTK_IOMMU_4GB_MODE_REMAP_BASE) - pa |= BIT_ULL(32); +
Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn
Am 07.08.19 um 01:15 schrieb Jason Gunthorpe: From: Jason Gunthorpe radeon is using a device global hash table to track what mmu_notifiers have been registered on struct mm. This is better served with the new get/put scheme instead. radeon has a bug where it was not blocking notifier release() until all the BO's had been invalidated. This could result in a use after free of pages the BOs. This is tied into a second bug where radeon left the notifiers running endlessly even once the interval tree became empty. This could result in a use after free with module unload. Both are fixed by changing the lifetime model, the BOs exist in the interval tree with their natural lifetimes independent of the mm_struct lifetime using the get/put scheme. The release runs synchronously and just does invalidate_start across the entire interval tree to create the required DMA fence. Additions to the interval tree after release are already impossible as only current->mm is used during the add. Signed-off-by: Jason Gunthorpe Acked-by: Christian König But I'm wondering if we shouldn't completely drop radeon userptr support. It's just to buggy, Christian. --- drivers/gpu/drm/radeon/radeon.h| 3 - drivers/gpu/drm/radeon/radeon_device.c | 2 - drivers/gpu/drm/radeon/radeon_drv.c| 2 + drivers/gpu/drm/radeon/radeon_mn.c | 157 ++--- 4 files changed, 38 insertions(+), 126 deletions(-) AMD team: I wonder if kfd has similar lifetime issues? diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 32808e50be12f8..918164f90b114a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2451,9 +2451,6 @@ struct radeon_device { /* tracking pinned memory */ u64 vram_pin_size; u64 gart_pin_size; - - struct mutexmn_lock; - DECLARE_HASHTABLE(mn_hash, 7); }; bool radeon_is_px(struct drm_device *dev); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index dceb554e567446..788b1d8a80e660 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1325,8 +1325,6 @@ int radeon_device_init(struct radeon_device *rdev, init_rwsem(>pm.mclk_lock); init_rwsem(>exclusive_lock); init_waitqueue_head(>irq.vblank_queue); - mutex_init(>mn_lock); - hash_init(rdev->mn_hash); r = radeon_gem_init(rdev); if (r) return r; diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index a6cbe11f79c611..b6535ac91fdb74 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -624,6 +625,7 @@ static void __exit radeon_exit(void) { pci_unregister_driver(pdriver); radeon_unregister_atpx_handler(); + mmu_notifier_synchronize(); } module_init(radeon_init); diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c index 8c3871ed23a9f0..fc8254273a800b 100644 --- a/drivers/gpu/drm/radeon/radeon_mn.c +++ b/drivers/gpu/drm/radeon/radeon_mn.c @@ -37,17 +37,8 @@ #include "radeon.h" struct radeon_mn { - /* constant after initialisation */ - struct radeon_device*rdev; - struct mm_struct*mm; struct mmu_notifier mn; - /* only used on destruction */ - struct work_struct work; - - /* protected by rdev->mn_lock */ - struct hlist_node node; - /* objects protected by lock */ struct mutexlock; struct rb_root_cached objects; @@ -58,55 +49,6 @@ struct radeon_mn_node { struct list_headbos; }; -/** - * radeon_mn_destroy - destroy the rmn - * - * @work: previously sheduled work item - * - * Lazy destroys the notifier from a work item - */ -static void radeon_mn_destroy(struct work_struct *work) -{ - struct radeon_mn *rmn = container_of(work, struct radeon_mn, work); - struct radeon_device *rdev = rmn->rdev; - struct radeon_mn_node *node, *next_node; - struct radeon_bo *bo, *next_bo; - - mutex_lock(>mn_lock); - mutex_lock(>lock); - hash_del(>node); - rbtree_postorder_for_each_entry_safe(node, next_node, ->objects.rb_root, it.rb) { - - interval_tree_remove(>it, >objects); - list_for_each_entry_safe(bo, next_bo, >bos, mn_list) { - bo->mn = NULL; - list_del_init(>mn_list); - } - kfree(node); - } - mutex_unlock(>lock); - mutex_unlock(>mn_lock); - mmu_notifier_unregister(>mn, rmn->mm); - kfree(rmn); -} - -/** - * radeon_mn_release - callback to notify about mm destruction - * - * @mn: our notifier - *
Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP
Helger, or other parisc folks: can you take a look at this patch in particular and the series in general? Thanks!
Re: [PATCH v9 0/5] treewide: improve R-Car SDHI performance
So, what are we going to do with this series? As said before I'd volunteer to pick this up through the dma-mapping tree, but I'd like to see ACKs from the other maintainers as well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/10] iommu: Remember when default domain type was set on kernel command line
Hey Lu Baolu, thanks for your review! On Thu, Aug 15, 2019 at 01:01:57PM +0800, Lu Baolu wrote: > > +#define IOMMU_CMD_LINE_DMA_API (1 << 0) > > Prefer BIT() micro? Yes, I'll change that. > > + iommu_set_cmd_line_dma_api(); > > IOMMU command line is also set in other places, for example, > iommu_setup() (arch/x86/kernel/pci-dma.c). Need to call this there as > well? You are right, I'll better add a 'bool cmd_line' parameter to the iommu_set_default_*() functions and tell the IOMMU core this way. That will also fix iommu=pt/nopt. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/10] iommu: Set default domain type at runtime
Hi, On 8/14/19 9:38 PM, Joerg Roedel wrote: From: Joerg Roedel Set the default domain-type at runtime, not at compile-time. This keeps default domain type setting in one place when we have to change it at runtime. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 233bc22b487e..96cc7cc8ab21 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -26,11 +26,8 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH -static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; -#else -static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; -#endif + +static unsigned int iommu_def_domain_type __read_mostly; static bool iommu_dma_strict __read_mostly = true; static u32 iommu_cmd_line __read_mostly; @@ -76,7 +73,7 @@ static void iommu_set_cmd_line_dma_api(void) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; } -static bool __maybe_unused iommu_cmd_line_dma_api(void) +static bool iommu_cmd_line_dma_api(void) { return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API); } @@ -115,8 +112,18 @@ static const char *iommu_domain_type_str(unsigned int t) static int __init iommu_subsys_init(void) { - pr_info("Default domain type: %s\n", - iommu_domain_type_str(iommu_def_domain_type)); + bool cmd_line = iommu_cmd_line_dma_api(); + + if (!cmd_line) { + if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH)) + iommu_set_default_passthrough(); + else + iommu_set_default_translated(); This overrides kernel parameters parsed in iommu_setup(), for example, iommu=pt won't work anymore. Best regards, Lu Baolu + } + + pr_info("Default domain type: %s %s\n", + iommu_domain_type_str(iommu_def_domain_type), + cmd_line ? "(set via kernel command line)" : ""); return 0; }
Re: [PATCH v6 5/8] iommu: Add bounce page APIs
Hi Joerg, On 8/14/19 4:38 PM, Joerg Roedel wrote: Hi Lu Baolu, On Tue, Jul 30, 2019 at 12:52:26PM +0800, Lu Baolu wrote: * iommu_bounce_map(dev, addr, paddr, size, dir, attrs) - Map a buffer start at DMA address @addr in bounce page manner. For buffer parts that doesn't cross a whole minimal IOMMU page, the bounce page policy is applied. A bounce page mapped by swiotlb will be used as the DMA target in the IOMMU page table. Otherwise, the physical address @paddr is mapped instead. * iommu_bounce_unmap(dev, addr, size, dir, attrs) - Unmap the buffer mapped with iommu_bounce_map(). The bounce page will be torn down after the bounced data get synced. * iommu_bounce_sync(dev, addr, size, dir, target) - Synce the bounced data in case the bounce mapped buffer is reused. I don't really get why this API extension is needed for your use-case. Can't this just be done using iommu_map/unmap operations? Can you please elaborate a bit why these functions are needed? iommu_map/unmap() APIs haven't parameters for dma direction and attributions. These parameters are elementary for DMA APIs. Say, after map, if the dma direction is TO_DEVICE and a bounce buffer is used, we must sync the data from the original dma buffer to the bounce buffer; In the opposite direction, if dma is FROM_DEVICE, before unmap, we need to sync the data from the bounce buffer onto the original buffer. The code in these functions are common to all iommu drivers which want to use bounce pages for untrusted devices. So I put them in the iommu.c. Or, maybe drivers/iommu/dma-iommu.c is more suitable? Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu