Re: [PATCH] iommu/arm-smmu-v3: add nr_ats_masters to avoid unnecessary operations
On 2019/8/14 19:14, Will Deacon wrote: > Hi, > > I've been struggling with the memory ordering requirements here. More below. > > On Thu, Aug 01, 2019 at 08:20:40PM +0800, Zhen Lei wrote: >> When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a >> smmu domain does not contain any ats master, the operations of >> arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() >> are always executed. This will impact performance, especially in >> multi-core and stress scenarios. For my FIO test scenario, about 8% >> performance reduced. >> >> In fact, we can use a atomic member to record how many ats masters the >> smmu contains. And check that without traverse the list and check all >> masters one by one in the lock protection. >> >> Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") >> Signed-off-by: Zhen Lei >> --- >> drivers/iommu/arm-smmu-v3.c | 10 -- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index a9a9fabd396804a..1b370d9aca95f94 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -631,6 +631,7 @@ struct arm_smmu_domain { >> >> struct io_pgtable_ops *pgtbl_ops; >> boolnon_strict; >> +atomic_tnr_ats_masters; >> >> enum arm_smmu_domain_stage stage; >> union { >> @@ -1531,7 +1532,7 @@ static int arm_smmu_atc_inv_domain(struct >> arm_smmu_domain *smmu_domain, >> struct arm_smmu_cmdq_ent cmd; >> struct arm_smmu_master *master; >> >> -if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) >> +if (!atomic_read(_domain->nr_ats_masters)) >> return 0; > > This feels wrong to me: the CPU can speculate ahead of time that > 'nr_ats_masters' is 0, but we could have a concurrent call to '->attach()' > for an ATS-enabled device. Wouldn't it then be possible for the new device > to populate its ATC as a result of speculative accesses for the mapping that > we're tearing down? > > The devices lock solves this problem by serialising invalidation with > '->attach()/->detach()' operations. > > John's suggestion of RCU might work better, but I think you'll need to call > synchronize_rcu() between adding yourself to the 'devices' list and enabling > ATS. > > What do you think? I have updated my patch and just sent, below it's my thoughts. - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) + /* +* The protectiom of spinlock(_domain->devices_lock) is omitted. +* Because for a given master, its map/unmap operations should only be +* happened after it has been attached and before it has been detached. +* So that, if at least one master need to be atc invalidated, the +* value of smmu_domain->nr_ats_masters can not be zero. +* +* This can alleviate performance loss in multi-core scenarios. +*/ + if (!smmu_domain->nr_ats_masters) > >> arm_smmu_atc_inv_to_cmd(ssid, iova, size, ); >> @@ -1869,6 +1870,7 @@ static int arm_smmu_enable_ats(struct arm_smmu_master >> *master) >> size_t stu; >> struct pci_dev *pdev; >> struct arm_smmu_device *smmu = master->smmu; >> +struct arm_smmu_domain *smmu_domain = master->domain; >> struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev); >> >> if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) || >> @@ -1887,12 +1889,15 @@ static int arm_smmu_enable_ats(struct >> arm_smmu_master *master) >> return ret; >> >> master->ats_enabled = true; >> +atomic_inc(_domain->nr_ats_masters); > > Here, we need to make sure that concurrent invalidation sees the updated > 'nr_ats_masters' value before ATS is enabled for the device, otherwise we > could miss an ATC invalidation. > > I think the code above gets this guarantee because of the way that ATS is > enabled in the STE, which ensures that we issue invalidation commands before > making the STE 'live'; this has the side-effect of a write barrier before > updating PROD, which I think we also rely on for installing the CD pointer. > > Put another way: writes are ordered before a subsequent command insertion. > > Do you agree? If so, I'll add a comment because this is subtle and easily > overlooked. > >> static void arm_smmu_disable_ats(struct arm_smmu_master *master) >> { >> struct arm_smmu_cmdq_ent cmd; >> +struct arm_smmu_domain *smmu_domain = master->domain; >> >> if (!master->ats_enabled || !dev_is_pci(master->dev)) >> return; >> @@ -1901,6 +1906,7 @@ static void arm_smmu_disable_ats(struct >> arm_smmu_master *master) >> arm_smmu_atc_inv_master(master, ); >> pci_disable_ats(to_pci_dev(master->dev)); >> master->ats_enabled = false; >> +atomic_dec(_domain->nr_ats_masters); > > This part is the other way around:
[PATCH v2 0/2] add nr_ats_masters for quickly check
v1 --> v2: 1. change the type of nr_ats_masters from atomic_t to int, and move its ind/dec operation from arm_smmu_enable_ats()/arm_smmu_disable_ats() to arm_smmu_attach_dev()/arm_smmu_detach_dev(), and protected by "spin_lock_irqsave(_domain->devices_lock, flags);" Zhen Lei (2): iommu/arm-smmu-v3: don't add a master into smmu_domain before it's ready iommu/arm-smmu-v3: add nr_ats_masters for quickly check drivers/iommu/arm-smmu-v3.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) -- 1.8.3
[PATCH v2 2/2] iommu/arm-smmu-v3: add nr_ats_masters for quickly check
When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a smmu domain does not contain any ats master, the operations of arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() are always executed. This will impact performance, especially in multi-core and stress scenarios. For my FIO test scenario, about 8% performance reduced. In fact, we can use a struct member to record how many ats masters that the smmu contains. And check that without traverse the list and check all masters one by one in the lock protection. Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") Signed-off-by: Zhen Lei --- drivers/iommu/arm-smmu-v3.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 29056d9bb12aa01..154334d3310c9b8 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -631,6 +631,7 @@ struct arm_smmu_domain { struct io_pgtable_ops *pgtbl_ops; boolnon_strict; + int nr_ats_masters; enum arm_smmu_domain_stage stage; union { @@ -1531,7 +1532,16 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, struct arm_smmu_cmdq_ent cmd; struct arm_smmu_master *master; - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) + /* +* The protectiom of spinlock(_domain->devices_lock) is omitted. +* Because for a given master, its map/unmap operations should only be +* happened after it has been attached and before it has been detached. +* So that, if at least one master need to be atc invalidated, the +* value of smmu_domain->nr_ats_masters can not be zero. +* +* This can alleviate performance loss in multi-core scenarios. +*/ + if (!smmu_domain->nr_ats_masters) return 0; arm_smmu_atc_inv_to_cmd(ssid, iova, size, ); @@ -1913,6 +1923,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) spin_lock_irqsave(_domain->devices_lock, flags); list_del(>domain_head); + smmu_domain->nr_ats_masters--; spin_unlock_irqrestore(_domain->devices_lock, flags); master->domain = NULL; @@ -1968,6 +1979,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) spin_lock_irqsave(_domain->devices_lock, flags); list_add(>domain_head, _domain->devices); + smmu_domain->nr_ats_masters++; spin_unlock_irqrestore(_domain->devices_lock, flags); out_unlock: mutex_unlock(_domain->init_mutex); -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] iommu/arm-smmu-v3: don't add a master into smmu_domain before it's ready
Once a master has been added into smmu_domain->devices, it may immediately be scaned in arm_smmu_unmap()-->arm_smmu_atc_inv_domain(). From a logical point of view, the master should be added into smmu_domain after it has been completely initialized. Signed-off-by: Zhen Lei --- drivers/iommu/arm-smmu-v3.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index a9a9fabd396804a..29056d9bb12aa01 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1958,10 +1958,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) master->domain = smmu_domain; - spin_lock_irqsave(_domain->devices_lock, flags); - list_add(>domain_head, _domain->devices); - spin_unlock_irqrestore(_domain->devices_lock, flags); - if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS) arm_smmu_enable_ats(master); @@ -1969,6 +1965,10 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) arm_smmu_write_ctx_desc(smmu, _domain->s1_cfg); arm_smmu_install_ste_for_dev(master); + + spin_lock_irqsave(_domain->devices_lock, flags); + list_add(>domain_head, _domain->devices); + spin_unlock_irqrestore(_domain->devices_lock, flags); out_unlock: mutex_unlock(_domain->init_mutex); return ret; -- 1.8.3
Re: [PATCH 07/10] iommu: Print default domain type on boot
Hi, On 8/14/19 9:38 PM, Joerg Roedel wrote: From: Joerg Roedel Introduce a subsys_initcall for IOMMU code and use it to print the default domain type at boot. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index e1feb4061b8b..233bc22b487e 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -93,12 +93,40 @@ struct iommu_group_attribute iommu_group_attr_##_name = \ static LIST_HEAD(iommu_device_list); static DEFINE_SPINLOCK(iommu_device_lock); +/* + * Use a function instead of an array here because the domain-type is a + * bit-field, so an array would waste memory. + */ +static const char *iommu_domain_type_str(unsigned int t) +{ + switch (t) { + case IOMMU_DOMAIN_BLOCKED: + return "Blocked"; + case IOMMU_DOMAIN_IDENTITY: + return "Passthrough"; + case IOMMU_DOMAIN_UNMANAGED: + return "Unmanaged"; + case IOMMU_DOMAIN_DMA: + return "Translated"; + default: + return "Unknown"; + } +} Run scripts/checkpatch.pl: ERROR: switch and case should be at the same indent #28: FILE: drivers/iommu/iommu.c:102: + switch (t) { + case IOMMU_DOMAIN_BLOCKED: [...] + case IOMMU_DOMAIN_IDENTITY: [...] + case IOMMU_DOMAIN_UNMANAGED: [...] + case IOMMU_DOMAIN_DMA: [...] + default: Best regards, Lu Baolu + +static int __init iommu_subsys_init(void) +{ + pr_info("Default domain type: %s\n", + iommu_domain_type_str(iommu_def_domain_type)); + + return 0; +} +subsys_initcall(iommu_subsys_init); + int iommu_device_register(struct iommu_device *iommu) { spin_lock(_device_lock); list_add_tail(>list, _device_list); spin_unlock(_device_lock); - return 0; }
Re: [PATCH 06/10] iommu: Remember when default domain type was set on kernel command line
Hi, On 8/14/19 9:38 PM, Joerg Roedel wrote: From: Joerg Roedel Introduce an extensible concept to remember when certain configuration settings for the IOMMU code have been set on the kernel command line. This will be used later to prevent overwriting these settings with other defaults. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f187e85a074b..e1feb4061b8b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -32,6 +32,7 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; #endif static bool iommu_dma_strict __read_mostly = true; +static u32 iommu_cmd_line __read_mostly; struct iommu_group { struct kobject kobj; @@ -68,6 +69,18 @@ static const char * const iommu_group_resv_type_string[] = { [IOMMU_RESV_SW_MSI] = "msi", }; +#define IOMMU_CMD_LINE_DMA_API (1 << 0) Prefer BIT() micro? + +static void iommu_set_cmd_line_dma_api(void) +{ + iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; +} + +static bool __maybe_unused iommu_cmd_line_dma_api(void) +{ + return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API); +} + #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name = \ __ATTR(_name, _mode, _show, _store) @@ -165,6 +178,8 @@ static int __init iommu_set_def_domain_type(char *str) if (ret) return ret; + iommu_set_cmd_line_dma_api(); IOMMU command line is also set in other places, for example, iommu_setup() (arch/x86/kernel/pci-dma.c). Need to call this there as well? Best regards, Lu Baolu + iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA; return 0; }
Re: [PATCH 04/10] x86/dma: Get rid of iommu_pass_through
Hi, On 8/14/19 9:38 PM, Joerg Roedel wrote: From: Joerg Roedel This variable has no users anymore. Remove it and tell the IOMMU code via its new functions about requested DMA modes. Signed-off-by: Joerg Roedel This will also simplify the procedures in iommu_probe_device() on x86 platforms. Reviewed-by: Lu Baolu --- arch/x86/include/asm/iommu.h | 1 - arch/x86/kernel/pci-dma.c| 11 +++ 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index baedab8ac538..b91623d521d9 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -4,7 +4,6 @@ extern int force_iommu, no_iommu; extern int iommu_detected; -extern int iommu_pass_through; /* 10 seconds */ #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index f62b498b18fb..a6fd479d4a71 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -43,12 +44,6 @@ int iommu_detected __read_mostly = 0; * It is also possible to disable by default in kernel config, and enable with * iommu=nopt at boot time. */ -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH -int iommu_pass_through __read_mostly = 1; -#else -int iommu_pass_through __read_mostly; -#endif - extern struct iommu_table_entry __iommu_table[], __iommu_table_end[]; void __init pci_iommu_alloc(void) @@ -120,9 +115,9 @@ static __init int iommu_setup(char *p) swiotlb = 1; #endif if (!strncmp(p, "pt", 2)) - iommu_pass_through = 1; + iommu_set_default_passthrough(); if (!strncmp(p, "nopt", 4)) - iommu_pass_through = 0; + iommu_set_default_translated(); gart_parse_options(p); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/10] iommu/vt-d: Request passthrough mode from IOMMU core
Hi Joerg, On 8/14/19 9:38 PM, Joerg Roedel wrote: From: Joerg Roedel Get rid of the iommu_pass_through variable and request passthrough mode via the new iommu core function. Signed-off-by: Joerg Roedel Looks good to me. Reviewed-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bdaed2da8a55..234bc2b55c59 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3267,7 +3267,7 @@ static int __init init_dmars(void) iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); } - if (iommu_pass_through) + if (iommu_default_passthrough()) iommu_identity_mapping |= IDENTMAP_ALL; #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA
Re: [PATCH v3 hmm 03/11] mm/mmu_notifiers: add a get/put scheme for the registration
On Wed, Aug 14, 2019 at 02:20:31PM -0700, Ralph Campbell wrote: > > On 8/6/19 4:15 PM, Jason Gunthorpe wrote: > > From: Jason Gunthorpe > > > > Many places in the kernel have a flow where userspace will create some > > object and that object will need to connect to the subsystem's > > mmu_notifier subscription for the duration of its lifetime. > > > > In this case the subsystem is usually tracking multiple mm_structs and it > > is difficult to keep track of what struct mmu_notifier's have been > > allocated for what mm's. > > > > Since this has been open coded in a variety of exciting ways, provide core > > functionality to do this safely. > > > > This approach uses the strct mmu_notifier_ops * as a key to determine if > > s/strct/struct Yes, thanks for all of this, I like having comments, but I'm a terrible proofreader :( Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 00/11] Add mmu_notifier_get/put for managing mmu notifier registrations
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe This series introduces a new registration flow for mmu_notifiers based on the idea that the user would like to get a single refcounted piece of memory for a mm, keyed to its use. For instance many users of mmu_notifiers use an interval tree or similar to dispatch notifications to some object. There are many objects but only one notifier subscription per mm holding the tree. Of the 12 places that call mmu_notifier_register: - 7 are maintaining some kind of obvious mapping of mm_struct to mmu_notifier registration, ie in some linked list or hash table. Of the 7 this series converts 4 (gru, hmm, RDMA, radeon) - 3 (hfi1, gntdev, vhost) are registering multiple notifiers, but each one immediately does some VA range filtering, ie with an interval tree. These would be better with a global subsystem-wide range filter and could convert to this API. - 2 (kvm, amd_iommu) are deliberately using a single mm at a time, and really can't use this API. One of the intel-svm's modes is also in this list The 3/7 unconverted drivers are: - intel-svm This driver tracks mm's in a global linked list 'global_svm_list' and would benefit from this API. Its flow is a bit complex, since it also wants a set of non-shared notifiers. - i915_gem_usrptr This driver tracks mm's in a per-device hash table (dev_priv->mm_structs), but only has an optional use of mmu_notifiers. Since it still seems to need the hash table it is difficult to convert. - amdkfd/kfd_process This driver is using a global SRCU hash table to track mm's The control flow here is very complicated and the driver is relying on this hash table to be fast on the ioctl syscall path. It would definitely benefit, but only if the ioctl path didn't need to do the search so often. This series is already entangled with patches in the hmm & RDMA tree and will require some git topic branches for the RDMA ODP stuff. I intend for it to go through the hmm tree. There is a git version here: https://github.com/jgunthorpe/linux/commits/mmu_notifier Which has the required pre-patches for the RDMA ODP conversion that are still being reviewed. Jason Gunthorpe (11): mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the caller mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm mm/mmu_notifiers: add a get/put scheme for the registration misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct hmm: use mmu_notifier_get/put for 'struct hmm' RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/odp: remove ib_ucontext from ib_umem drm/radeon: use mmu_notifier_get/put for struct radeon_mn drm/amdkfd: fix a use after free race with mmu_notifer unregister drm/amdkfd: use mmu_notifier_put mm/mmu_notifiers: remove unregister_no_release drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 - drivers/gpu/drm/amd/amdkfd/kfd_process.c | 88 - drivers/gpu/drm/nouveau/nouveau_drm.c| 3 + drivers/gpu/drm/radeon/radeon.h | 3 - drivers/gpu/drm/radeon/radeon_device.c | 2 - drivers/gpu/drm/radeon/radeon_drv.c | 2 + drivers/gpu/drm/radeon/radeon_mn.c | 157 drivers/infiniband/core/umem.c | 4 +- drivers/infiniband/core/umem_odp.c | 183 ++ drivers/infiniband/core/uverbs_cmd.c | 3 - drivers/infiniband/core/uverbs_main.c| 1 + drivers/infiniband/hw/mlx5/main.c| 5 - drivers/misc/sgi-gru/grufile.c | 1 + drivers/misc/sgi-gru/grutables.h | 2 - drivers/misc/sgi-gru/grutlbpurge.c | 84 +++-- include/linux/hmm.h | 12 +- include/linux/mm_types.h | 6 - include/linux/mmu_notifier.h | 40 +++- include/rdma/ib_umem.h | 2 +- include/rdma/ib_umem_odp.h | 10 +- include/rdma/ib_verbs.h | 3 - kernel/fork.c| 1 - mm/hmm.c | 121 +++- mm/mmu_notifier.c| 230 +-- 25 files changed, 408 insertions(+), 559 deletions(-) For the core MM, HMM, and nouveau changes you can add: Tested-by: Ralph Campbell ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 11/11] mm/mmu_notifiers: remove unregister_no_release
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no longer have any users, they have all been converted to use mmu_notifier_put(). So delete this difficult to use interface. Signed-off-by: Jason Gunthorpe Reviewed-by: Ralph Campbell ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 05/11] hmm: use mmu_notifier_get/put for 'struct hmm'
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe This is a significant simplification, it eliminates all the remaining 'hmm' stuff in mm_struct, eliminates krefing along the critical notifier paths, and takes away all the ugly locking and abuse of page_table_lock. mmu_notifier_get() provides the single struct hmm per struct mm which eliminates mm->hmm. It also directly guarantees that no mmu_notifier op callback is callable while concurrent free is possible, this eliminates all the krefs inside the mmu_notifier callbacks. The remaining krefs in the range code were overly cautious, drivers are already not permitted to free the mirror while a range exists. Signed-off-by: Jason Gunthorpe Looks good. Reviewed-by: Ralph Campbell ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 03/11] mm/mmu_notifiers: add a get/put scheme for the registration
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe Many places in the kernel have a flow where userspace will create some object and that object will need to connect to the subsystem's mmu_notifier subscription for the duration of its lifetime. In this case the subsystem is usually tracking multiple mm_structs and it is difficult to keep track of what struct mmu_notifier's have been allocated for what mm's. Since this has been open coded in a variety of exciting ways, provide core functionality to do this safely. This approach uses the strct mmu_notifier_ops * as a key to determine if s/strct/struct the subsystem has a notifier registered on the mm or not. If there is a registration then the existing notifier struct is returned, otherwise the ops->alloc_notifiers() is used to create a new per-subsystem notifier for the mm. The destroy side incorporates an async call_srcu based destruction which will avoid bugs in the callers such as commit 6d7c3cde93c1 ("mm/hmm: fix use after free with struct hmm in the mmu notifiers"). Since we are inside the mmu notifier core locking is fairly simple, the allocation uses the same approach as for mmu_notifier_mm, the write side of the mmap_sem makes everything deterministic and we only need to do hlist_add_head_rcu() under the mm_take_all_locks(). The new users count and the discoverability in the hlist is fully serialized by the mmu_notifier_mm->lock. Co-developed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Reviewed-by: Ralph Campbell --- include/linux/mmu_notifier.h | 35 mm/mmu_notifier.c| 156 +-- 2 files changed, 185 insertions(+), 6 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index b6c004bd9f6ad9..31aa971315a142 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -211,6 +211,19 @@ struct mmu_notifier_ops { */ void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end); + + /* +* These callbacks are used with the get/put interface to manage the +* lifetime of the mmu_notifier memory. alloc_notifier() returns a new +* notifier for use with the mm. +* +* free_notifier() is only called after the mmu_notifier has been +* fully put, calls to any ops callback are prevented and no ops +* callbacks are currently running. It is called from a SRCU callback +* and cannot sleep. +*/ + struct mmu_notifier *(*alloc_notifier)(struct mm_struct *mm); + void (*free_notifier)(struct mmu_notifier *mn); }; /* @@ -227,6 +240,9 @@ struct mmu_notifier_ops { struct mmu_notifier { struct hlist_node hlist; const struct mmu_notifier_ops *ops; + struct mm_struct *mm; + struct rcu_head rcu; + unsigned int users; }; static inline int mm_has_notifiers(struct mm_struct *mm) @@ -234,6 +250,21 @@ static inline int mm_has_notifiers(struct mm_struct *mm) return unlikely(mm->mmu_notifier_mm); } +struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops, +struct mm_struct *mm); +static inline struct mmu_notifier * +mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm) +{ + struct mmu_notifier *ret; + + down_write(>mmap_sem); + ret = mmu_notifier_get_locked(ops, mm); + up_write(>mmap_sem); + return ret; +} +void mmu_notifier_put(struct mmu_notifier *mn); +void mmu_notifier_synchronize(void); + extern int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm); extern int __mmu_notifier_register(struct mmu_notifier *mn, @@ -581,6 +612,10 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm) #define pudp_huge_clear_flush_notify pudp_huge_clear_flush #define set_pte_at_notify set_pte_at +static inline void mmu_notifier_synchronize(void) +{ +} + #endif /* CONFIG_MMU_NOTIFIER */ #endif /* _LINUX_MMU_NOTIFIER_H */ diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 696810f632ade1..4a770b5211b71d 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -248,6 +248,9 @@ int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) lockdep_assert_held_write(>mmap_sem); BUG_ON(atomic_read(>mm_users) <= 0); + mn->mm = mm; + mn->users = 1; + if (!mm->mmu_notifier_mm) { /* * kmalloc cannot be called under mm_take_all_locks(), but we @@ -295,18 +298,24 @@ int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) } EXPORT_SYMBOL_GPL(__mmu_notifier_register); -/* +/** + * mmu_notifier_register - Register a notifier on a mm + * @mn: The notifier to attach + *
Re: [PATCH v3 hmm 02/11] mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe A prior commit e0f3c3f78da2 ("mm/mmu_notifier: init notifier if necessary") made an attempt at doing this, but had to be reverted as calling the GFP_KERNEL allocator under the i_mmap_mutex causes deadlock, see commit 35cfa2b0b491 ("mm/mmu_notifier: allocate mmu_notifier in advance"). However, we can avoid that problem by doing the allocation only under the mmap_sem, which is already happening. Since all writers to mm->mmu_notifier_mm hold the write side of the mmap_sem reading it under that sem is deterministic and we can use that to decide if the allocation path is required, without speculation. The actual update to mmu_notifier_mm must still be done under the mm_take_all_locks() to ensure read-side coherency. Signed-off-by: Jason Gunthorpe Looks good to me. Reviewed-by: Ralph Campbell ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 01/11] mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the caller
On 8/6/19 4:15 PM, Jason Gunthorpe wrote: From: Jason Gunthorpe This simplifies the code to not have so many one line functions and extra logic. __mmu_notifier_register() simply becomes the entry point to register the notifier, and the other one calls it under lock. Also add a lockdep_assert to check that the callers are holding the lock as expected. Suggested-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Nice clean up. Reviewed-by: Ralph Campbell ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types
On Thu, 18 Jul 2019 10:35:37 +0200 Auger Eric wrote: > Hi Jacob, > > On 6/9/19 3:44 PM, Jacob Pan wrote: > > When Shared Virtual Memory is exposed to a guest via vIOMMU, > > scalable IOTLB invalidation may be passed down from outside IOMMU > > subsystems. This patch adds invalidation functions that can be used > > for additional translation cache types. > > > > Signed-off-by: Jacob Pan > > --- > > drivers/iommu/dmar.c| 50 > > + > > include/linux/intel-iommu.h | 21 +++ 2 files > > changed, 67 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > > index 6d969a1..0cda6fb 100644 > > --- a/drivers/iommu/dmar.c > > +++ b/drivers/iommu/dmar.c > > @@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu > > *iommu, u16 did, u64 addr, qi_submit_sync(, iommu); > > } > > > > +/* PASID-based IOTLB Invalidate */ > > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, > > u32 pasid, > > + unsigned int size_order, u64 granu, int ih) > > +{ > > + struct qi_desc desc; > nit: you could also init to {}; > > + > > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | > > + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE; > > + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) | > > + QI_EIOTLB_AM(size_order); > > + desc.qw2 = 0; > > + desc.qw3 = 0; > > + qi_submit_sync(, iommu); > > +} > > + > > void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 > > pfsid, u16 qdep, u64 addr, unsigned mask) > > { > > @@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu > > *iommu, u16 sid, u16 pfsid, qi_submit_sync(, iommu); > > } > > > > +/* PASID-based device IOTLB Invalidate */ > > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 > > pfsid, > > + u32 pasid, u16 qdep, u64 addr, unsigned size, u64 > > granu) > s/size/size_order > > +{ > > + struct qi_desc desc; > > + > > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | > > QI_DEV_EIOTLB_SID(sid) | > > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | > > + QI_DEV_IOTLB_PFSID(pfsid); > maybe add a comment to remind MIP hint is sent to 0 as of now. > > + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu); > > + > > + /* If S bit is 0, we only flush a single page. If S bit is > > set, > > +* The least significant zero bit indicates the size. VT-d > > spec > > +* 6.5.2.6 > > +*/ > > + if (!size) > > + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & > > ~QI_DEV_EIOTLB_SIZE; > this is qw1. my mistake. > > + else { > > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + > > size); > don't you miss a "- 1 " here? your are right > > + > > + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | > > QI_DEV_EIOTLB_SIZE; > desc.qw1 |= addr & ~mask | QI_DEV_EIOTLB_SIZE; > ie. I don't think QI_DEV_EIOTLB_ADDR is useful here? > > > So won't the following lines do the job? > > unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size) -1; > desc.qw1 |= addr & ~mask; > if (size) > desc.qw1 |= QI_DEV_EIOTLB_SIZE that would work too, and simpler. thanks for the suggestion. will change. > > + } > > + qi_submit_sync(, iommu); > > +} > > + > > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 > > granu, int pasid) +{ > > + struct qi_desc desc; > > + > > + desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) > > | QI_PC_PASID(pasid); > nit: reorder the fields according to the spec, easier to check if any > is missing. sounds good. > > + desc.qw1 = 0; > > + desc.qw2 = 0; > > + desc.qw3 = 0; > > + qi_submit_sync(, iommu); > > +} > > /* > > * Disable Queued Invalidation interface. > > */ > > diff --git a/include/linux/intel-iommu.h > > b/include/linux/intel-iommu.h index 94d3a9a..1cdb35b 100644 > > --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -339,7 +339,7 @@ enum { > > #define QI_IOTLB_GRAN(gran)(((u64)gran) >> > > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr) > > (((u64)addr) & VTD_PAGE_MASK) #define > > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define > > QI_IOTLB_AM(am) (((u8)am)) +#define > > QI_IOTLB_AM(am) (((u8)am) & 0x3f) > > #define QI_CC_FM(fm) (((u64)fm) << 48) > > #define QI_CC_SID(sid) (((u64)sid) << 32) > > @@ -357,17 +357,22 @@ enum { > > #define QI_PC_DID(did) (((u64)did) << 16) > > #define QI_PC_GRAN(gran) (((u64)gran) << 4) > > > > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0)) > > -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1)) > > +/* PASID cache invalidation granu */ > > +#define QI_PC_ALL_PASIDS 0 > > +#define QI_PC_PASID_SEL1 > > > > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) > > #define QI_EIOTLB_GL(gl) (((u64)gl) << 7) > > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6) >
Re: [GIT PULL] dma mapping fixes for 5.3-rc
The pull request you sent on Wed, 14 Aug 2019 16:12:17 +0200: > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/e83b009c5c366b678c7986fa6c1d38fed06c954c Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [git pull] IOMMU Fixes for Linux v5.3-rc4
The pull request you sent on Wed, 14 Aug 2019 16:09:09 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git > tags/iommu-fixes-v5.3-rc4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/b5e33e44d994bb03c75f1901d47b1cf971f752a0 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/15] iommu/arm-smmu: Rework cb_base handling
On Fri, Aug 09, 2019 at 06:07:41PM +0100, Robin Murphy wrote: > To keep register-access quirks manageable, we want to structure things > to avoid needing too many individual overrides. It seems fairly clean to > have a single interface which handles both global and context registers > in terms of the architectural pages, so the first preparatory step is to > rework cb_base into a page number rather than an absolute address. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu.c | 22 -- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index d9a93e5f422f..463bc8d98adb 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -95,7 +95,7 @@ > #endif > > /* Translation context bank */ > -#define ARM_SMMU_CB(smmu, n) ((smmu)->cb_base + ((n) << (smmu)->pgshift)) > +#define ARM_SMMU_CB(smmu, n) ((smmu)->base + (((smmu)->cb_base + (n)) << > (smmu)->pgshift)) > > #define MSI_IOVA_BASE0x800 > #define MSI_IOVA_LENGTH 0x10 > @@ -168,8 +168,8 @@ struct arm_smmu_device { > struct device *dev; > > void __iomem*base; > - void __iomem*cb_base; > - unsigned long pgshift; > + unsigned intcb_base; I think this is now a misnomer. Would you be able to rename it cb_pfn or something, please? Otherwise, this seems fine. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 13/13] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()
With all the pieces in place, we can finally propagate the iommu_iotlb_gather structure from the call to unmap() down to the IOMMU drivers' implementation of ->tlb_add_page(). Currently everybody ignores it, but the machinery is now there to defer invalidation. Signed-off-by: Will Deacon --- drivers/iommu/arm-smmu-v3.c| 3 ++- drivers/iommu/arm-smmu.c | 3 ++- drivers/iommu/io-pgtable-arm-v7s.c | 23 ++- drivers/iommu/io-pgtable-arm.c | 22 ++ drivers/iommu/msm_iommu.c | 3 ++- drivers/iommu/mtk_iommu.c | 3 ++- drivers/iommu/qcom_iommu.c | 3 ++- include/linux/io-pgtable.h | 16 +--- 8 files changed, 47 insertions(+), 29 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 8e2e53079f48..d1ebc7103065 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1596,7 +1596,8 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, } while (size -= granule); } -static void arm_smmu_tlb_inv_page_nosync(unsigned long iova, size_t granule, +static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather, +unsigned long iova, size_t granule, void *cookie) { arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index f6689956ab6e..5598d0ff71a8 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -574,7 +574,8 @@ static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, ops->tlb_sync(cookie); } -static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule, +static void arm_smmu_tlb_add_page(struct iommu_iotlb_gather *gather, + unsigned long iova, size_t granule, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index a7776e982b6c..18e7d212c7de 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -362,7 +362,8 @@ static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl) return false; } -static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *, unsigned long, +static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *, + struct iommu_iotlb_gather *, unsigned long, size_t, int, arm_v7s_iopte *); static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data, @@ -383,7 +384,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data, size_t sz = ARM_V7S_BLOCK_SIZE(lvl); tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl); - if (WARN_ON(__arm_v7s_unmap(data, iova + i * sz, + if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz, sz, lvl, tblp) != sz)) return -EINVAL; } else if (ptep[i]) { @@ -545,6 +546,7 @@ static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data, } static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data, + struct iommu_iotlb_gather *gather, unsigned long iova, size_t size, arm_v7s_iopte blk_pte, arm_v7s_iopte *ptep) @@ -581,14 +583,15 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data, return 0; tablep = iopte_deref(pte, 1); - return __arm_v7s_unmap(data, iova, size, 2, tablep); + return __arm_v7s_unmap(data, gather, iova, size, 2, tablep); } - io_pgtable_tlb_add_page(>iop, iova, size); + io_pgtable_tlb_add_page(>iop, gather, iova, size); return size; } static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, + struct iommu_iotlb_gather *gather, unsigned long iova, size_t size, int lvl, arm_v7s_iopte *ptep) { @@ -647,7 +650,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, */ smp_wmb(); } else { - io_pgtable_tlb_add_page(iop, iova, blk_size); + io_pgtable_tlb_add_page(iop, gather, iova, blk_size); } iova += blk_size; } @@ -657,12 +660,13 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, * Insert a table at the next level to map the old region,
[PATCH 03/13] iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
In preparation for TLB flush gathering in the IOMMU API, rename the iommu_gather_ops structure in io-pgtable to iommu_flush_ops, which better describes its purpose and avoids the potential for confusion between different levels of the API. $ find linux/ -type f -name '*.[ch]' | xargs sed -i 's/gather_ops/flush_ops/g' Signed-off-by: Will Deacon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/iommu/arm-smmu-v3.c | 4 ++-- drivers/iommu/arm-smmu.c| 8 drivers/iommu/io-pgtable-arm-v7s.c | 2 +- drivers/iommu/io-pgtable-arm.c | 2 +- drivers/iommu/ipmmu-vmsa.c | 4 ++-- drivers/iommu/msm_iommu.c | 4 ++-- drivers/iommu/mtk_iommu.c | 4 ++-- drivers/iommu/qcom_iommu.c | 4 ++-- include/linux/io-pgtable.h | 6 +++--- 10 files changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 92ac995dd9c6..17bceb11e708 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -257,7 +257,7 @@ static void mmu_tlb_sync_context(void *cookie) // TODO: Wait 1000 GPU cycles for HW_ISSUE_6367/T60X } -static const struct iommu_gather_ops mmu_tlb_ops = { +static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, .tlb_add_flush = mmu_tlb_inv_range_nosync, .tlb_sync = mmu_tlb_sync_context, diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index a9a9fabd3968..7e137e1e28f1 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1603,7 +1603,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, } while (size -= granule); } -static const struct iommu_gather_ops arm_smmu_gather_ops = { +static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync, @@ -1796,7 +1796,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) .ias= ias, .oas= oas, .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, - .tlb= _smmu_gather_ops, + .tlb= _smmu_flush_ops, .iommu_dev = smmu->dev, }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 64977c131ee6..dc08db347ef3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -251,7 +251,7 @@ enum arm_smmu_domain_stage { struct arm_smmu_domain { struct arm_smmu_device *smmu; struct io_pgtable_ops *pgtbl_ops; - const struct iommu_gather_ops *tlb_ops; + const struct iommu_flush_ops*tlb_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; boolnon_strict; @@ -547,19 +547,19 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size, writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID); } -static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = { +static const struct iommu_flush_ops arm_smmu_s1_tlb_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context_s1, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync_context, }; -static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = { +static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v2 = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync_context, }; -static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v1 = { +static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v1 = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, .tlb_add_flush = arm_smmu_tlb_inv_vmid_nosync, .tlb_sync = arm_smmu_tlb_sync_vmid, diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index a62733c6a632..116f97ee991e 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -817,7 +817,7 @@ static void dummy_tlb_sync(void *cookie) WARN_ON(cookie != cfg_cookie); } -static const struct iommu_gather_ops dummy_tlb_ops = { +static const struct iommu_flush_ops dummy_tlb_ops = { .tlb_flush_all = dummy_tlb_flush_all, .tlb_add_flush = dummy_tlb_add_flush, .tlb_sync = dummy_tlb_sync, diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 0d6633921c1e..402f913b6f6d 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -1081,7 +1081,7 @@ static void dummy_tlb_sync(void *cookie)
[PATCH 06/13] iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
To allow IOMMU drivers to batch up TLB flushing operations and postpone them until ->iotlb_sync() is called, extend the prototypes for the ->unmap() and ->iotlb_sync() IOMMU ops callbacks to take a pointer to the current iommu_iotlb_gather structure. All affected IOMMU drivers are updated, but there should be no functional change since the extra parameter is ignored for now. Signed-off-by: Will Deacon --- drivers/iommu/amd_iommu.c | 11 +-- drivers/iommu/arm-smmu-v3.c| 7 --- drivers/iommu/arm-smmu.c | 5 +++-- drivers/iommu/exynos-iommu.c | 3 ++- drivers/iommu/intel-iommu.c| 3 ++- drivers/iommu/iommu.c | 2 +- drivers/iommu/ipmmu-vmsa.c | 12 +--- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 13 ++--- drivers/iommu/mtk_iommu_v1.c | 3 ++- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 12 +--- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 3 ++- drivers/iommu/tegra-gart.c | 12 +--- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c | 5 +++-- include/linux/iommu.h | 7 --- 18 files changed, 73 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index f93b148cf55e..29eeea914660 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3055,7 +3055,8 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, } static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, - size_t page_size) + size_t page_size, + struct iommu_iotlb_gather *gather) { struct protection_domain *domain = to_pdomain(dom); size_t unmap_size; @@ -3196,6 +3197,12 @@ static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) domain_flush_complete(dom); } +static void amd_iommu_iotlb_sync(struct iommu_domain *domain, +struct iommu_iotlb_gather *gather) +{ + amd_iommu_flush_iotlb_all(domain); +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3214,7 +3221,7 @@ const struct iommu_ops amd_iommu_ops = { .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, .flush_iotlb_all = amd_iommu_flush_iotlb_all, - .iotlb_sync = amd_iommu_flush_iotlb_all, + .iotlb_sync = amd_iommu_iotlb_sync, }; /* diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 7e137e1e28f1..80753b8ca054 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1985,8 +1985,8 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, return ops->map(ops, iova, paddr, size, prot); } -static size_t -arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) +static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, +size_t size, struct iommu_iotlb_gather *gather) { int ret; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); @@ -2010,7 +2010,8 @@ static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) arm_smmu_tlb_inv_context(smmu_domain); } -static void arm_smmu_iotlb_sync(struct iommu_domain *domain) +static void arm_smmu_iotlb_sync(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather) { struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index dc08db347ef3..e535ae2a9e65 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1301,7 +1301,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, -size_t size) +size_t size, struct iommu_iotlb_gather *gather) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; @@ -1329,7 +1329,8 @@ static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) } } -static void arm_smmu_iotlb_sync(struct iommu_domain *domain) +static void arm_smmu_iotlb_sync(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index b0c1e5f9daae..cf5af34cb681 100644 --- a/drivers/iommu/exynos-iommu.c +++
[PATCH 10/13] iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
The ->tlb_add_flush() callback in the io-pgtable API now looks a bit silly: - It takes a size and a granule, which are always the same - It takes a 'bool leaf', which is always true - It only ever flushes a single page With that in mind, replace it with an optional ->tlb_add_page() callback that drops the useless parameters. Signed-off-by: Will Deacon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 5 -- drivers/iommu/arm-smmu-v3.c | 8 ++- drivers/iommu/arm-smmu.c| 88 + drivers/iommu/io-pgtable-arm-v7s.c | 12 ++--- drivers/iommu/io-pgtable-arm.c | 11 ++--- drivers/iommu/ipmmu-vmsa.c | 7 --- drivers/iommu/msm_iommu.c | 7 ++- drivers/iommu/mtk_iommu.c | 8 ++- drivers/iommu/qcom_iommu.c | 8 ++- include/linux/io-pgtable.h | 22 - 10 files changed, 105 insertions(+), 71 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 651858147bd6..ff9af320cacc 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -247,10 +247,6 @@ static void mmu_tlb_inv_context_s1(void *cookie) mmu_hw_do_operation(pfdev, 0, 0, ~0UL, AS_COMMAND_FLUSH_MEM); } -static void mmu_tlb_inv_range_nosync(unsigned long iova, size_t size, -size_t granule, bool leaf, void *cookie) -{} - static void mmu_tlb_sync_context(void *cookie) { //struct panfrost_device *pfdev = cookie; @@ -273,7 +269,6 @@ static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, .tlb_flush_walk = mmu_tlb_flush_walk, .tlb_flush_leaf = mmu_tlb_flush_leaf, - .tlb_add_flush = mmu_tlb_inv_range_nosync, .tlb_sync = mmu_tlb_sync_context, }; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 79819b003b07..98c90a1b4b22 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1603,6 +1603,12 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, } while (size -= granule); } +static void arm_smmu_tlb_inv_page_nosync(unsigned long iova, size_t granule, +void *cookie) +{ + arm_smmu_tlb_inv_range_nosync(iova, granule, granule, true, cookie); +} + static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, size_t granule, void *cookie) { @@ -1627,7 +1633,7 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context, .tlb_flush_walk = arm_smmu_tlb_inv_walk, .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, - .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, + .tlb_add_page = arm_smmu_tlb_inv_page_nosync, .tlb_sync = arm_smmu_tlb_sync, }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e9f01b860ae3..f056164a94b0 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -248,10 +248,16 @@ enum arm_smmu_domain_stage { ARM_SMMU_DOMAIN_BYPASS, }; +struct arm_smmu_flush_ops { + struct iommu_flush_ops tlb; + void (*tlb_inv_range)(unsigned long iova, size_t size, size_t granule, + bool leaf, void *cookie) +}; + struct arm_smmu_domain { struct arm_smmu_device *smmu; struct io_pgtable_ops *pgtbl_ops; - const struct iommu_flush_ops*tlb_ops; + const struct arm_smmu_flush_ops *flush_ops; struct arm_smmu_cfg cfg; enum arm_smmu_domain_stage stage; boolnon_strict; @@ -551,42 +557,62 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, size_t granule, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; + const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; - smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, false, cookie); - smmu_domain->tlb_ops->tlb_sync(cookie); + ops->tlb_inv_range(iova, size, granule, false, cookie); + ops->tlb.tlb_sync(cookie); } static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, size_t granule, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; + const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; + + ops->tlb_inv_range(iova, size, granule, true, cookie); + ops->tlb.tlb_sync(cookie); +} + +static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule, + void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; - smmu_domain->tlb_ops->tlb_add_flush(iova,
[PATCH 11/13] iommu/io-pgtable: Remove unused ->tlb_sync() callback
The ->tlb_sync() callback is no longer used, so it can be removed. Signed-off-by: Will Deacon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 - drivers/iommu/arm-smmu-v3.c | 8 drivers/iommu/arm-smmu.c| 17 + drivers/iommu/io-pgtable-arm-v7s.c | 6 -- drivers/iommu/io-pgtable-arm.c | 6 -- drivers/iommu/ipmmu-vmsa.c | 1 - drivers/iommu/msm_iommu.c | 20 +++- drivers/iommu/mtk_iommu.c | 1 - drivers/iommu/qcom_iommu.c | 1 - include/linux/io-pgtable.h | 9 - 10 files changed, 16 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index ff9af320cacc..de22a2276e00 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -269,7 +269,6 @@ static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, .tlb_flush_walk = mmu_tlb_flush_walk, .tlb_flush_leaf = mmu_tlb_flush_leaf, - .tlb_sync = mmu_tlb_sync_context, }; static const char *access_type_name(struct panfrost_device *pfdev, diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 98c90a1b4b22..231093413ff9 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1545,13 +1545,6 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, } /* IO_PGTABLE API */ -static void arm_smmu_tlb_sync(void *cookie) -{ - struct arm_smmu_domain *smmu_domain = cookie; - - arm_smmu_cmdq_issue_sync(smmu_domain->smmu); -} - static void arm_smmu_tlb_inv_context(void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; @@ -1634,7 +1627,6 @@ static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_walk = arm_smmu_tlb_inv_walk, .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_page = arm_smmu_tlb_inv_page_nosync, - .tlb_sync = arm_smmu_tlb_sync, }; /* IOMMU API */ diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index f056164a94b0..07a267c437d6 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -251,7 +251,8 @@ enum arm_smmu_domain_stage { struct arm_smmu_flush_ops { struct iommu_flush_ops tlb; void (*tlb_inv_range)(unsigned long iova, size_t size, size_t granule, - bool leaf, void *cookie) + bool leaf, void *cookie); + void (*tlb_sync)(void *cookie); }; struct arm_smmu_domain { @@ -539,7 +540,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears * almost negligible, but the benefit of getting the first one in as far ahead * of the sync as possible is significant, hence we don't just make this a - * no-op and set .tlb_sync to arm_smmu_inv_context_s2() as you might think. + * no-op and set .tlb_sync to arm_smmu_tlb_inv_context_s2() as you might think. */ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size, size_t granule, bool leaf, void *cookie) @@ -560,7 +561,7 @@ static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; ops->tlb_inv_range(iova, size, granule, false, cookie); - ops->tlb.tlb_sync(cookie); + ops->tlb_sync(cookie); } static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, @@ -570,7 +571,7 @@ static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, const struct arm_smmu_flush_ops *ops = smmu_domain->flush_ops; ops->tlb_inv_range(iova, size, granule, true, cookie); - ops->tlb.tlb_sync(cookie); + ops->tlb_sync(cookie); } static void arm_smmu_tlb_add_page(unsigned long iova, size_t granule, @@ -588,9 +589,9 @@ static const struct arm_smmu_flush_ops arm_smmu_s1_tlb_ops = { .tlb_flush_walk = arm_smmu_tlb_inv_walk, .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_page = arm_smmu_tlb_add_page, - .tlb_sync = arm_smmu_tlb_sync_context, }, .tlb_inv_range = arm_smmu_tlb_inv_range_nosync, + .tlb_sync = arm_smmu_tlb_sync_context, }; static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = { @@ -599,9 +600,9 @@ static const struct arm_smmu_flush_ops arm_smmu_s2_tlb_ops_v2 = { .tlb_flush_walk = arm_smmu_tlb_inv_walk, .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_page = arm_smmu_tlb_add_page, - .tlb_sync = arm_smmu_tlb_sync_context, }, .tlb_inv_range = arm_smmu_tlb_inv_range_nosync, +
[PATCH 08/13] iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers
Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers using the io-pgtable API so that we can start making use of them in the page-table code. For now, they can just wrap the implementations of ->tlb_add_flush and ->tlb_sync pending future optimisation in each driver. Signed-off-by: Will Deacon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 14 ++ drivers/iommu/arm-smmu-v3.c | 22 ++ drivers/iommu/arm-smmu.c| 24 drivers/iommu/ipmmu-vmsa.c | 8 drivers/iommu/msm_iommu.c | 16 drivers/iommu/mtk_iommu.c | 16 drivers/iommu/qcom_iommu.c | 16 7 files changed, 116 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 17bceb11e708..651858147bd6 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -257,8 +257,22 @@ static void mmu_tlb_sync_context(void *cookie) // TODO: Wait 1000 GPU cycles for HW_ISSUE_6367/T60X } +static void mmu_tlb_flush_walk(unsigned long iova, size_t size, size_t granule, + void *cookie) +{ + mmu_tlb_sync_context(cookie); +} + +static void mmu_tlb_flush_leaf(unsigned long iova, size_t size, size_t granule, + void *cookie) +{ + mmu_tlb_sync_context(cookie); +} + static const struct iommu_flush_ops mmu_tlb_ops = { .tlb_flush_all = mmu_tlb_inv_context_s1, + .tlb_flush_walk = mmu_tlb_flush_walk, + .tlb_flush_leaf = mmu_tlb_flush_leaf, .tlb_add_flush = mmu_tlb_inv_range_nosync, .tlb_sync = mmu_tlb_sync_context, }; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 80753b8ca054..79819b003b07 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1603,8 +1603,30 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, } while (size -= granule); } +static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + arm_smmu_tlb_inv_range_nosync(iova, size, granule, false, cookie); + arm_smmu_cmdq_issue_sync(smmu); +} + +static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + struct arm_smmu_device *smmu = smmu_domain->smmu; + + arm_smmu_tlb_inv_range_nosync(iova, size, granule, true, cookie); + arm_smmu_cmdq_issue_sync(smmu); +} + static const struct iommu_flush_ops arm_smmu_flush_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context, + .tlb_flush_walk = arm_smmu_tlb_inv_walk, + .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync, }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e535ae2a9e65..e9f01b860ae3 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -547,20 +547,44 @@ static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size, writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID); } +static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + + smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, false, cookie); + smmu_domain->tlb_ops->tlb_sync(cookie); +} + +static void arm_smmu_tlb_inv_leaf(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + + smmu_domain->tlb_ops->tlb_add_flush(iova, size, granule, true, cookie); + smmu_domain->tlb_ops->tlb_sync(cookie); +} + static const struct iommu_flush_ops arm_smmu_s1_tlb_ops = { .tlb_flush_all = arm_smmu_tlb_inv_context_s1, + .tlb_flush_walk = arm_smmu_tlb_inv_walk, + .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync_context, }; static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v2 = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, + .tlb_flush_walk = arm_smmu_tlb_inv_walk, + .tlb_flush_leaf = arm_smmu_tlb_inv_leaf, .tlb_add_flush = arm_smmu_tlb_inv_range_nosync, .tlb_sync = arm_smmu_tlb_sync_context, }; static const struct iommu_flush_ops arm_smmu_s2_tlb_ops_v1 = { .tlb_flush_all = arm_smmu_tlb_inv_context_s2, + .tlb_flush_walk = arm_smmu_tlb_inv_walk, +
[PATCH 05/13] iommu: Introduce iommu_iotlb_gather_add_page()
Introduce a helper function for drivers to use when updating an iommu_iotlb_gather structure in response to an ->unmap() call, rather than having to open-code the logic in every page-table implementation. Signed-off-by: Will Deacon --- include/linux/iommu.h | 31 +++ 1 file changed, 31 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index aaf073010a9a..ad41aee55bc6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -507,6 +507,31 @@ static inline void iommu_tlb_sync(struct iommu_domain *domain, iommu_iotlb_gather_init(iotlb_gather); } +static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather, + unsigned long iova, size_t size) +{ + unsigned long start = iova, end = start + size; + + /* +* If the new page is disjoint from the current range or is mapped at +* a different granularity, then sync the TLB so that the gather +* structure can be rewritten. +*/ + if (gather->pgsize != size || + end < gather->start || start > gather->end) { + if (gather->pgsize) + iommu_tlb_sync(domain, gather); + gather->pgsize = size; + } + + if (gather->end < end) + gather->end = end; + + if (gather->start > start) + gather->start = start; +} + /* PCI device grouping function */ extern struct iommu_group *pci_device_group(struct device *dev); /* Generic device grouping function */ @@ -847,6 +872,12 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) { } +static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather, + unsigned long iova, size_t size) +{ +} + static inline void iommu_device_unregister(struct iommu_device *iommu) { } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 02/13] iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
Commit b6b65ca20bc9 ("iommu/io-pgtable-arm: Add support for non-strict mode") added an unconditional call to io_pgtable_tlb_sync() immediately after the case where we replace a block entry with a table entry during an unmap() call. This is redundant, since the IOMMU API will call iommu_tlb_sync() on this path and the patch in question mentions this: | To save having to reason about it too much, make sure the invalidation | in arm_lpae_split_blk_unmap() just performs its own unconditional sync | to minimise the window in which we're technically violating the break- | before-make requirement on a live mapping. This might work out redundant | with an outer-level sync for strict unmaps, but we'll never be splitting | blocks on a DMA fastpath anyway. However, this sync gets in the way of deferred TLB invalidation for leaf entries and is at best a questionable, unproven hack. Remove it. Signed-off-by: Will Deacon --- drivers/iommu/io-pgtable-arm-v7s.c | 1 - drivers/iommu/io-pgtable-arm.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 0fc8dfab2abf..a62733c6a632 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -587,7 +587,6 @@ static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data, } io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - io_pgtable_tlb_sync(>iop); return size; } diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 161a7d56264d..0d6633921c1e 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -583,7 +583,6 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, tablep = iopte_deref(pte, data); } else if (unmap_idx >= 0) { io_pgtable_tlb_add_flush(>iop, iova, size, size, true); - io_pgtable_tlb_sync(>iop); return size; } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 07/13] iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
In preparation for deferring TLB flushes to iommu_tlb_sync(), introduce two new synchronous invalidation helpers to the io-pgtable API, which allow the unmap() code to force invalidation in cases where it cannot be deferred (e.g. when replacing a table with a block or when TLBI_ON_MAP is set). Signed-off-by: Will Deacon --- include/linux/io-pgtable.h | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 6292ea15d674..27275575b305 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -19,17 +19,31 @@ enum io_pgtable_fmt { /** * struct iommu_flush_ops - IOMMU callbacks for TLB and page table management. * - * @tlb_flush_all: Synchronously invalidate the entire TLB context. - * @tlb_add_flush: Queue up a TLB invalidation for a virtual address range. - * @tlb_sync: Ensure any queued TLB invalidation has taken effect, and - * any corresponding page table updates are visible to the - * IOMMU. + * @tlb_flush_all: Synchronously invalidate the entire TLB context. + * @tlb_flush_walk: Synchronously invalidate all intermediate TLB state + * (sometimes referred to as the "walk cache") for a virtual + * address range. + * @tlb_flush_leaf: Synchronously invalidate all leaf TLB state for a virtual + * address range. + * @tlb_add_flush: Optional callback to queue up leaf TLB invalidation for a + * virtual address range. This function exists purely as an + * optimisation for IOMMUs that cannot batch TLB invalidation + * operations efficiently and are therefore better suited to + * issuing them early rather than deferring them until + * iommu_tlb_sync(). + * @tlb_sync: Ensure any queued TLB invalidation has taken effect, and + * any corresponding page table updates are visible to the + * IOMMU. * * Note that these can all be called in atomic context and must therefore * not block. */ struct iommu_flush_ops { void (*tlb_flush_all)(void *cookie); + void (*tlb_flush_walk)(unsigned long iova, size_t size, size_t granule, + void *cookie); + void (*tlb_flush_leaf)(unsigned long iova, size_t size, size_t granule, + void *cookie); void (*tlb_add_flush)(unsigned long iova, size_t size, size_t granule, bool leaf, void *cookie); void (*tlb_sync)(void *cookie); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 12/13] iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
Update the io-pgtable ->unmap() function to take an iommu_iotlb_gather pointer as an argument, and update the callers as appropriate. Signed-off-by: Will Deacon --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 2 +- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/io-pgtable-arm-v7s.c | 6 +++--- drivers/iommu/io-pgtable-arm.c | 7 +++ drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- include/linux/io-pgtable.h | 4 +++- 10 files changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index de22a2276e00..6e8145c36e93 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -222,7 +222,7 @@ void panfrost_mmu_unmap(struct panfrost_gem_object *bo) size_t unmapped_page; size_t pgsize = get_pgsize(iova, len - unmapped_len); - unmapped_page = ops->unmap(ops, iova, pgsize); + unmapped_page = ops->unmap(ops, iova, pgsize, NULL); if (!unmapped_page) break; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 231093413ff9..8e2e53079f48 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2015,7 +2015,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, if (!ops) return 0; - ret = ops->unmap(ops, iova, size); + ret = ops->unmap(ops, iova, size, gather); if (ret && arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size)) return 0; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 07a267c437d6..f6689956ab6e 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1362,7 +1362,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, return 0; arm_smmu_rpm_get(smmu); - ret = ops->unmap(ops, iova, size); + ret = ops->unmap(ops, iova, size, gather); arm_smmu_rpm_put(smmu); return ret; diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 203894fb6765..a7776e982b6c 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -666,7 +666,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, } static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova, - size_t size) + size_t size, struct iommu_iotlb_gather *gather) { struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); @@ -892,7 +892,7 @@ static int __init arm_v7s_do_selftests(void) size = 1UL << __ffs(cfg.pgsize_bitmap); while (i < loopnr) { iova_start = i * SZ_16M; - if (ops->unmap(ops, iova_start + size, size) != size) + if (ops->unmap(ops, iova_start + size, size, NULL) != size) return __FAIL(ops); /* Remap of partial unmap */ @@ -910,7 +910,7 @@ static int __init arm_v7s_do_selftests(void) for_each_set_bit(i, _bitmap, BITS_PER_LONG) { size = 1UL << i; - if (ops->unmap(ops, iova, size) != size) + if (ops->unmap(ops, iova, size, NULL) != size) return __FAIL(ops); if (ops->iova_to_phys(ops, iova + 42)) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index f35516744965..325430f8a0a1 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include #include @@ -642,7 +641,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, } static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova, -size_t size) +size_t size, struct iommu_iotlb_gather *gather) { struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); arm_lpae_iopte *ptep = data->pgd; @@ -1167,7 +1166,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg) /* Partial unmap */ size = 1UL << __ffs(cfg->pgsize_bitmap); - if (ops->unmap(ops, SZ_1G + size, size) != size) + if (ops->unmap(ops, SZ_1G + size, size, NULL) != size) return __FAIL(ops, i); /* Remap of partial unmap */ @@ -1182,7 +1181,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg) for_each_set_bit(j, >pgsize_bitmap, BITS_PER_LONG) { size = 1UL << j; -
[PATCH 00/13] Rework IOMMU API to allow for batching of invalidation
Hi everybody, These are the core IOMMU changes that I have posted previously as part of my ongoing effort to reduce the lock contention of the SMMUv3 command queue. I thought it would be better to split this out as a separate series, since I think it's ready to go and all the driver conversions mean that it's quite a pain for me to maintain out of tree! The idea of the patch series is to allow TLB invalidation to be batched up into a new 'struct iommu_iotlb_gather' structure, which tracks the properties of the virtual address range being invalidated so that it can be deferred until the driver's ->iotlb_sync() function is called. This allows for more efficient invalidation on hardware that can submit multiple invalidations in one go. The previous series was included in: https://lkml.kernel.org/r/20190711171927.28803-1-w...@kernel.org The only real change since then is incorporating the newly merged virtio-iommu driver. If you'd like to play with the patches, then I've also pushed them here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/unmap but they should behave as a no-op on their own. Patches to convert the Arm SMMUv3 driver to the new API are here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq Cheers, Will --->8 Cc: Jean-Philippe Brucker Cc: Robin Murphy Cc: Jayachandran Chandrasekharan Nair Cc: Jan Glauber Cc: Jon Masters Cc: Eric Auger Cc: Zhen Lei Cc: Jonathan Cameron Cc: Vijay Kilary Cc: Joerg Roedel Cc: John Garry Cc: Alex Williamson Cc: Marek Szyprowski Cc: David Woodhouse Will Deacon (13): iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes iommu: Introduce iommu_iotlb_gather_add_page() iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf() iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() iommu/io-pgtable: Remove unused ->tlb_sync() callback iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() drivers/gpu/drm/panfrost/panfrost_mmu.c | 24 +--- drivers/iommu/amd_iommu.c | 11 ++-- drivers/iommu/arm-smmu-v3.c | 52 +++- drivers/iommu/arm-smmu.c| 103 drivers/iommu/dma-iommu.c | 9 ++- drivers/iommu/exynos-iommu.c| 3 +- drivers/iommu/intel-iommu.c | 3 +- drivers/iommu/io-pgtable-arm-v7s.c | 57 +- drivers/iommu/io-pgtable-arm.c | 48 --- drivers/iommu/iommu.c | 24 drivers/iommu/ipmmu-vmsa.c | 28 + drivers/iommu/msm_iommu.c | 42 + drivers/iommu/mtk_iommu.c | 45 +++--- drivers/iommu/mtk_iommu_v1.c| 3 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 44 +++--- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 3 +- drivers/iommu/tegra-gart.c | 12 +++- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c| 5 +- drivers/vfio/vfio_iommu_type1.c | 27 + include/linux/io-pgtable.h | 57 -- include/linux/iommu.h | 92 +--- 24 files changed, 483 insertions(+), 215 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 04/13] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
To permit batching of TLB flushes across multiple calls to the IOMMU driver's ->unmap() implementation, introduce a new structure for tracking the address range to be flushed and the granularity at which the flushing is required. This is hooked into the IOMMU API and its caller are updated to make use of the new structure. Subsequent patches will plumb this into the IOMMU drivers as well, but for now the gathering information is ignored. Signed-off-by: Will Deacon --- drivers/iommu/dma-iommu.c | 9 +++-- drivers/iommu/iommu.c | 19 +++--- drivers/vfio/vfio_iommu_type1.c | 26 - include/linux/iommu.h | 43 + 4 files changed, 75 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a7f9c3edbcb2..80beb1f5994a 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -444,13 +444,18 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; size_t iova_off = iova_offset(iovad, dma_addr); + struct iommu_iotlb_gather iotlb_gather; + size_t unmapped; dma_addr -= iova_off; size = iova_align(iovad, size + iova_off); + iommu_iotlb_gather_init(_gather); + + unmapped = iommu_unmap_fast(domain, dma_addr, size, _gather); + WARN_ON(unmapped != size); - WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size); if (!cookie->fq_domain) - iommu_tlb_sync(domain); + iommu_tlb_sync(domain, _gather); iommu_dma_free_iova(cookie, dma_addr, size); } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6d7b25fe2474..d67222fdfe44 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1862,7 +1862,7 @@ EXPORT_SYMBOL_GPL(iommu_map); static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, - bool sync) + struct iommu_iotlb_gather *iotlb_gather) { const struct iommu_ops *ops = domain->ops; size_t unmapped_page, unmapped = 0; @@ -1910,9 +1910,6 @@ static size_t __iommu_unmap(struct iommu_domain *domain, unmapped += unmapped_page; } - if (sync && ops->iotlb_sync) - ops->iotlb_sync(domain); - trace_unmap(orig_iova, size, unmapped); return unmapped; } @@ -1920,14 +1917,22 @@ static size_t __iommu_unmap(struct iommu_domain *domain, size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { - return __iommu_unmap(domain, iova, size, true); + struct iommu_iotlb_gather iotlb_gather; + size_t ret; + + iommu_iotlb_gather_init(_gather); + ret = __iommu_unmap(domain, iova, size, _gather); + iommu_tlb_sync(domain, _gather); + + return ret; } EXPORT_SYMBOL_GPL(iommu_unmap); size_t iommu_unmap_fast(struct iommu_domain *domain, - unsigned long iova, size_t size) + unsigned long iova, size_t size, + struct iommu_iotlb_gather *iotlb_gather) { - return __iommu_unmap(domain, iova, size, false); + return __iommu_unmap(domain, iova, size, iotlb_gather); } EXPORT_SYMBOL_GPL(iommu_unmap_fast); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index fad7fd8c167c..ad830abe1021 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -650,12 +650,13 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data, } static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain, - struct list_head *regions) + struct list_head *regions, + struct iommu_iotlb_gather *iotlb_gather) { long unlocked = 0; struct vfio_regions *entry, *next; - iommu_tlb_sync(domain->domain); + iommu_tlb_sync(domain->domain, iotlb_gather); list_for_each_entry_safe(entry, next, regions, list) { unlocked += vfio_unpin_pages_remote(dma, @@ -685,13 +686,15 @@ static size_t unmap_unpin_fast(struct vfio_domain *domain, struct vfio_dma *dma, dma_addr_t *iova, size_t len, phys_addr_t phys, long *unlocked, struct list_head *unmapped_list, - int *unmapped_cnt) + int *unmapped_cnt, + struct iommu_iotlb_gather *iotlb_gather) { size_t unmapped = 0; struct vfio_regions *entry = kzalloc(sizeof(*entry), GFP_KERNEL); if (entry) { - unmapped = iommu_unmap_fast(domain->domain, *iova, len); +
[PATCH 01/13] iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
Commit add02cfdc9bc ("iommu: Introduce Interface for IOMMU TLB Flushing") added three new TLB flushing operations to the IOMMU API so that the underlying driver operations can be batched when unmapping large regions of IO virtual address space. However, the ->iotlb_range_add() callback has not been implemented by any IOMMU drivers (amd_iommu.c implements it as an empty function, which incurs the overhead of an indirect branch). Instead, drivers either flush the entire IOTLB in the ->iotlb_sync() callback or perform the necessary invalidation during ->unmap(). Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed two major issues: 1. The page size used to map the region in the page-table is not known, and so it is not generally possible to issue TLB flushes in the most efficient manner. 2. The only mutable state passed to the callback is a pointer to the iommu_domain, which can be accessed concurrently and therefore requires expensive synchronisation to keep track of the outstanding flushes. Remove the callback entirely in preparation for extending ->unmap() and ->iotlb_sync() to update a token on the caller's stack. Signed-off-by: Will Deacon --- drivers/iommu/amd_iommu.c | 6 -- drivers/iommu/iommu.c | 3 --- drivers/vfio/vfio_iommu_type1.c | 1 - include/linux/iommu.h | 15 --- 4 files changed, 25 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index b607a92791d3..f93b148cf55e 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3196,11 +3196,6 @@ static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) domain_flush_complete(dom); } -static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, - unsigned long iova, size_t size) -{ -} - const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3219,7 +3214,6 @@ const struct iommu_ops amd_iommu_ops = { .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, .flush_iotlb_all = amd_iommu_flush_iotlb_all, - .iotlb_range_add = amd_iommu_iotlb_range_add, .iotlb_sync = amd_iommu_flush_iotlb_all, }; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0c674d80c37f..6d7b25fe2474 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1903,9 +1903,6 @@ static size_t __iommu_unmap(struct iommu_domain *domain, if (!unmapped_page) break; - if (sync && ops->iotlb_range_add) - ops->iotlb_range_add(domain, iova, pgsize); - pr_debug("unmapped: iova 0x%lx size 0x%zx\n", iova, unmapped_page); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 054391f30fa8..fad7fd8c167c 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -696,7 +696,6 @@ static size_t unmap_unpin_fast(struct vfio_domain *domain, if (!unmapped) { kfree(entry); } else { - iommu_tlb_range_add(domain->domain, *iova, unmapped); entry->iova = *iova; entry->phys = phys; entry->len = unmapped; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fdc355ccc570..1e21431262d9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -201,7 +201,6 @@ struct iommu_sva_ops { * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain - * @iotlb_range_add: Add a given iova range to the flush queue for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush *queue @@ -244,8 +243,6 @@ struct iommu_ops { size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size); void (*flush_iotlb_all)(struct iommu_domain *domain); - void (*iotlb_range_add)(struct iommu_domain *domain, - unsigned long iova, size_t size); void (*iotlb_sync_map)(struct iommu_domain *domain); void (*iotlb_sync)(struct iommu_domain *domain); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); @@ -476,13 +473,6 @@ static inline void iommu_flush_tlb_all(struct iommu_domain *domain) domain->ops->flush_iotlb_all(domain); } -static inline void iommu_tlb_range_add(struct iommu_domain *domain, - unsigned long iova, size_t size) -{ - if
[PATCH 09/13] iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
Now that all IOMMU drivers using the io-pgtable API implement the ->tlb_flush_walk() and ->tlb_flush_leaf() callbacks, we can use them in the io-pgtable code instead of ->tlb_add_flush() immediately followed by ->tlb_sync(). Signed-off-by: Will Deacon --- drivers/iommu/io-pgtable-arm-v7s.c | 25 +++-- drivers/iommu/io-pgtable-arm.c | 17 - include/linux/io-pgtable.h | 14 ++ 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 116f97ee991e..8d4914fe73bc 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -493,9 +493,8 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, * a chance for anything to kick off a table walk for the new iova. */ if (iop->cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { - io_pgtable_tlb_add_flush(iop, iova, size, -ARM_V7S_BLOCK_SIZE(2), false); - io_pgtable_tlb_sync(iop); + io_pgtable_tlb_flush_walk(iop, iova, size, + ARM_V7S_BLOCK_SIZE(2)); } else { wmb(); } @@ -541,8 +540,7 @@ static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data, __arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, >cfg); size *= ARM_V7S_CONT_PAGES; - io_pgtable_tlb_add_flush(iop, iova, size, size, true); - io_pgtable_tlb_sync(iop); + io_pgtable_tlb_flush_leaf(iop, iova, size, size); return pte; } @@ -637,9 +635,8 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, for (i = 0; i < num_entries; i++) { if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) { /* Also flush any partial walks */ - io_pgtable_tlb_add_flush(iop, iova, blk_size, - ARM_V7S_BLOCK_SIZE(lvl + 1), false); - io_pgtable_tlb_sync(iop); + io_pgtable_tlb_flush_walk(iop, iova, blk_size, + ARM_V7S_BLOCK_SIZE(lvl + 1)); ptep = iopte_deref(pte[i], lvl); __arm_v7s_free_table(ptep, lvl + 1, data); } else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) { @@ -805,13 +802,19 @@ static void dummy_tlb_flush_all(void *cookie) WARN_ON(cookie != cfg_cookie); } -static void dummy_tlb_add_flush(unsigned long iova, size_t size, - size_t granule, bool leaf, void *cookie) +static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule, + void *cookie) { WARN_ON(cookie != cfg_cookie); WARN_ON(!(size & cfg_cookie->pgsize_bitmap)); } +static void dummy_tlb_add_flush(unsigned long iova, size_t size, + size_t granule, bool leaf, void *cookie) +{ + dummy_tlb_flush(iova, size, granule, cookie); +} + static void dummy_tlb_sync(void *cookie) { WARN_ON(cookie != cfg_cookie); @@ -819,6 +822,8 @@ static void dummy_tlb_sync(void *cookie) static const struct iommu_flush_ops dummy_tlb_ops = { .tlb_flush_all = dummy_tlb_flush_all, + .tlb_flush_walk = dummy_tlb_flush, + .tlb_flush_leaf = dummy_tlb_flush, .tlb_add_flush = dummy_tlb_add_flush, .tlb_sync = dummy_tlb_sync, }; diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 402f913b6f6d..b58338c86323 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -611,9 +611,8 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ - io_pgtable_tlb_add_flush(iop, iova, size, - ARM_LPAE_GRANULE(data), false); - io_pgtable_tlb_sync(iop); + io_pgtable_tlb_flush_walk(iop, iova, size, + ARM_LPAE_GRANULE(data)); ptep = iopte_deref(pte, data); __arm_lpae_free_pgtable(data, lvl + 1, ptep); } else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) { @@ -1069,13 +1068,19 @@ static void dummy_tlb_flush_all(void *cookie) WARN_ON(cookie != cfg_cookie); } -static void dummy_tlb_add_flush(unsigned long iova, size_t size, - size_t granule, bool leaf, void *cookie) +static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule, + void *cookie) { WARN_ON(cookie != cfg_cookie); WARN_ON(!(size &
Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?
On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote: > Hello > > Since lot of release (at least since 4.19), I hit the following error message: > DMA-API: cacheline tracking ENOMEM, dma-debug disabled > > After hitting that, I try to check who is creating so many DMA mapping and > see: > cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c > 6 ahci > 257 e1000e > 6 ehci-pci >5891 nouveau > 24 uhci_hcd > > Does nouveau having this high number of DMA mapping is normal ? Yeah seems perfectly fine for a gpu. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 04/11] misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
On Wed, Aug 14, 2019 at 03:58:34PM +, Jason Gunthorpe wrote: > On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote: > > Looks good, > > > > Reviewed-by: Christoph Hellwig > > Dimitri, are you OK with this patch? > I think this looks OK. Reviewed-by: Dimitri Sivanich ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/15] iommu/arm-smmu: Convert GR0 registers to bitfields
On 14/08/2019 18:20, Will Deacon wrote: On Fri, Aug 09, 2019 at 06:07:38PM +0100, Robin Murphy wrote: FIELD_PREP remains a terrible name, but the overall simplification will make further work on this stuff that much more manageable. This also serves as an audit of the header, wherein we can impose a consistent grouping and ordering of the offset and field definitions Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu-regs.h | 126 -- drivers/iommu/arm-smmu.c | 51 +++--- 2 files changed, 84 insertions(+), 93 deletions(-) diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h index 1c278f7ae888..d189f025537a 100644 --- a/drivers/iommu/arm-smmu-regs.h +++ b/drivers/iommu/arm-smmu-regs.h @@ -10,111 +10,101 @@ #ifndef _ARM_SMMU_REGS_H #define _ARM_SMMU_REGS_H +#include + /* Configuration registers */ #define ARM_SMMU_GR0_sCR0 0x0 -#define sCR0_CLIENTPD (1 << 0) -#define sCR0_GFRE (1 << 1) -#define sCR0_GFIE (1 << 2) -#define sCR0_EXIDENABLE(1 << 3) -#define sCR0_GCFGFRE (1 << 4) -#define sCR0_GCFGFIE (1 << 5) -#define sCR0_USFCFG(1 << 10) -#define sCR0_VMIDPNE (1 << 11) -#define sCR0_PTM (1 << 12) -#define sCR0_FB(1 << 13) -#define sCR0_VMID16EN (1 << 31) -#define sCR0_BSU_SHIFT 14 -#define sCR0_BSU_MASK 0x3 +#define sCR0_VMID16EN BIT(31) +#define sCR0_BSU GENMASK(15, 14) +#define sCR0_FBBIT(13) +#define sCR0_PTM BIT(12) +#define sCR0_VMIDPNE BIT(11) +#define sCR0_USFCFGBIT(10) +#define sCR0_GCFGFIE BIT(5) +#define sCR0_GCFGFRE BIT(4) +#define sCR0_EXIDENABLEBIT(3) +#define sCR0_GFIE BIT(2) +#define sCR0_GFRE BIT(1) +#define sCR0_CLIENTPD BIT(0) /* Auxiliary Configuration register */ #define ARM_SMMU_GR0_sACR 0x10 /* Identification registers */ #define ARM_SMMU_GR0_ID0 0x20 +#define ID0_S1TS BIT(30) +#define ID0_S2TS BIT(29) +#define ID0_NTSBIT(28) +#define ID0_SMSBIT(27) +#define ID0_ATOSNS BIT(26) +#define ID0_PTFS_NO_AARCH32BIT(25) +#define ID0_PTFS_NO_AARCH32S BIT(24) +#define ID0_CTTW BIT(14) +#define ID0_NUMIRPTGENMASK(23, 16) nit: assuming this should be above ID0_CTTW so things are in descending bit order? Bah, indeed it should. Fixed now. Other than that, looks good to me. Thanks! Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/15] iommu/arm-smmu: Convert GR0 registers to bitfields
On Fri, Aug 09, 2019 at 06:07:38PM +0100, Robin Murphy wrote: > FIELD_PREP remains a terrible name, but the overall simplification will > make further work on this stuff that much more manageable. This also > serves as an audit of the header, wherein we can impose a consistent > grouping and ordering of the offset and field definitions > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu-regs.h | 126 -- > drivers/iommu/arm-smmu.c | 51 +++--- > 2 files changed, 84 insertions(+), 93 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h > index 1c278f7ae888..d189f025537a 100644 > --- a/drivers/iommu/arm-smmu-regs.h > +++ b/drivers/iommu/arm-smmu-regs.h > @@ -10,111 +10,101 @@ > #ifndef _ARM_SMMU_REGS_H > #define _ARM_SMMU_REGS_H > > +#include > + > /* Configuration registers */ > #define ARM_SMMU_GR0_sCR00x0 > -#define sCR0_CLIENTPD(1 << 0) > -#define sCR0_GFRE(1 << 1) > -#define sCR0_GFIE(1 << 2) > -#define sCR0_EXIDENABLE (1 << 3) > -#define sCR0_GCFGFRE (1 << 4) > -#define sCR0_GCFGFIE (1 << 5) > -#define sCR0_USFCFG (1 << 10) > -#define sCR0_VMIDPNE (1 << 11) > -#define sCR0_PTM (1 << 12) > -#define sCR0_FB (1 << 13) > -#define sCR0_VMID16EN(1 << 31) > -#define sCR0_BSU_SHIFT 14 > -#define sCR0_BSU_MASK0x3 > +#define sCR0_VMID16ENBIT(31) > +#define sCR0_BSU GENMASK(15, 14) > +#define sCR0_FB BIT(13) > +#define sCR0_PTM BIT(12) > +#define sCR0_VMIDPNE BIT(11) > +#define sCR0_USFCFG BIT(10) > +#define sCR0_GCFGFIE BIT(5) > +#define sCR0_GCFGFRE BIT(4) > +#define sCR0_EXIDENABLE BIT(3) > +#define sCR0_GFIEBIT(2) > +#define sCR0_GFREBIT(1) > +#define sCR0_CLIENTPDBIT(0) > > /* Auxiliary Configuration register */ > #define ARM_SMMU_GR0_sACR0x10 > > /* Identification registers */ > #define ARM_SMMU_GR0_ID0 0x20 > +#define ID0_S1TS BIT(30) > +#define ID0_S2TS BIT(29) > +#define ID0_NTS BIT(28) > +#define ID0_SMS BIT(27) > +#define ID0_ATOSNS BIT(26) > +#define ID0_PTFS_NO_AARCH32 BIT(25) > +#define ID0_PTFS_NO_AARCH32S BIT(24) > +#define ID0_CTTW BIT(14) > +#define ID0_NUMIRPT GENMASK(23, 16) nit: assuming this should be above ID0_CTTW so things are in descending bit order? Other than that, looks good to me. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
On Fri, 5 Jul 2019 10:21:27 +0800 Lu Baolu wrote: > Hi Jacob, > > On 6/28/19 4:22 AM, Jacob Pan wrote: > >>> + } > >>> + refcount_set(>refs, 0); > >>> + ioasid_set_data(data->hpasid, svm); > >>> + INIT_LIST_HEAD_RCU(>devs); > >>> + INIT_LIST_HEAD(>list); > >>> + > >>> + mmput(svm->mm); > >>> + } > >>> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); > >>> + if (!sdev) { > >>> + ret = -ENOMEM; > >>> + goto out; > >> I think you need to clean up svm if its device list is empty here, > >> as you said above: > >> > > No, we come here only if the device list is not empty and the new > > device to bind is different than any existing device in the list. > > If we cannot allocate memory for the new device, should not free > > the existing SVM, right? > > > > I'm sorry, but the code doesn't show this. We come here even an svm > data structure was newly created with an empty device list. I post > the code below to ensure that we are reading a same piece of code. > Sorry for the delay. You are right, I need to clean up svm if device list is empty. Thanks! > mutex_lock(_mutex); > svm = ioasid_find(NULL, data->hpasid, NULL); > if (IS_ERR(svm)) { > ret = PTR_ERR(svm); > goto out; > } > if (svm) { > /* > * If we found svm for the PASID, there must be at > * least one device bond, otherwise svm should be > freed. */ > BUG_ON(list_empty(>devs)); > > for_each_svm_dev() { > /* In case of multiple sub-devices of the > same pdev assigned, we should > * allow multiple bind calls with the same > PASID and pdev. > */ > sdev->users++; > goto out; > } > } else { > /* We come here when PASID has never been bond to a > device. */ > svm = kzalloc(sizeof(*svm), GFP_KERNEL); > if (!svm) { > ret = -ENOMEM; > goto out; > } > /* REVISIT: upper layer/VFIO can track host process > that bind the PASID. > * ioasid_set = mm might be sufficient for vfio to > check pasid VMM > * ownership. > */ > svm->mm = get_task_mm(current); > svm->pasid = data->hpasid; > if (data->flags & IOMMU_SVA_GPASID_VAL) { > svm->gpasid = data->gpasid; > svm->flags &= SVM_FLAG_GUEST_PASID; > } > refcount_set(>refs, 0); > ioasid_set_data(data->hpasid, svm); > INIT_LIST_HEAD_RCU(>devs); > INIT_LIST_HEAD(>list); > > mmput(svm->mm); > } > sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); > if (!sdev) { > ret = -ENOMEM; > goto out; > } > sdev->dev = dev; > sdev->users = 1; > > Best regards, > Baolu [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5.2 085/144] iommu/vt-d: Check if domain->pgd was allocated
[ Upstream commit 3ee9eca760e7d0b68c55813243de66bbb499dc3b ] There is a couple of places where on domain_init() failure domain_exit() is called. While currently domain_init() can fail only if alloc_pgtable_page() has failed. Make domain_exit() check if domain->pgd present, before calling domain_unmap(), as it theoretically should crash on clearing pte entries in dma_pte_clear_level(). Cc: David Woodhouse Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Signed-off-by: Dmitry Safonov Reviewed-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel-iommu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2101601adf57d..1ad24367373f4 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1900,7 +1900,6 @@ static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu, static void domain_exit(struct dmar_domain *domain) { - struct page *freelist; /* Remove associated devices and clear attached or cached domains */ rcu_read_lock(); @@ -1910,9 +1909,12 @@ static void domain_exit(struct dmar_domain *domain) /* destroy iovas */ put_iova_domain(>iovad); - freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw)); + if (domain->pgd) { + struct page *freelist; - dma_free_pagelist(freelist); + freelist = domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw)); + dma_free_pagelist(freelist); + } free_domain_mem(domain); } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 08/11] drm/radeon: use mmu_notifier_get/put for struct radeon_mn
On Tue, Aug 06, 2019 at 08:15:45PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > radeon is using a device global hash table to track what mmu_notifiers > have been registered on struct mm. This is better served with the new > get/put scheme instead. > > radeon has a bug where it was not blocking notifier release() until all > the BO's had been invalidated. This could result in a use after free of > pages the BOs. This is tied into a second bug where radeon left the > notifiers running endlessly even once the interval tree became > empty. This could result in a use after free with module unload. > > Both are fixed by changing the lifetime model, the BOs exist in the > interval tree with their natural lifetimes independent of the mm_struct > lifetime using the get/put scheme. The release runs synchronously and just > does invalidate_start across the entire interval tree to create the > required DMA fence. > > Additions to the interval tree after release are already impossible as > only current->mm is used during the add. > > Signed-off-by: Jason Gunthorpe > drivers/gpu/drm/radeon/radeon.h| 3 - > drivers/gpu/drm/radeon/radeon_device.c | 2 - > drivers/gpu/drm/radeon/radeon_drv.c| 2 + > drivers/gpu/drm/radeon/radeon_mn.c | 157 ++--- > 4 files changed, 38 insertions(+), 126 deletions(-) AMD team: Are you OK with this patch? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 hmm 04/11] misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote: > Looks good, > > Reviewed-by: Christoph Hellwig Dimitri, are you OK with this patch? Thanks, Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device
On 11/08/2019 09:05, Christoph Hellwig wrote: We still treat devices without a DMA mask as defaulting to 32-bits for both mask, but a few releases ago we've started warning about such cases, as they require special cases to work around this sloppyness. Add a dma_mask field to struct platform_object so that we can initialize s/object/device/ the dma_mask pointer in struct device and initialize both masks to 32-bits by default. Architectures can still override this in arch_setup_pdev_archdata if needed. Note that the code looks a little odd with the various conditionals because we have to support platform_device structures that are statically allocated. This would be a good point to also get rid of the long-standing bodge in platform_device_register_full(). Signed-off-by: Christoph Hellwig --- drivers/base/platform.c | 15 +-- include/linux/platform_device.h | 1 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index ec974ba9c0c4..b216fcb0a8af 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -264,6 +264,17 @@ struct platform_object { char name[]; }; +static void setup_pdev_archdata(struct platform_device *pdev) Bikeshed: painting the generic DMA API properties as "archdata" feels a bit off-target :/ +{ + if (!pdev->dev.coherent_dma_mask) + pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32); + if (!pdev->dma_mask) + pdev->dma_mask = DMA_BIT_MASK(32); + if (!pdev->dev.dma_mask) + pdev->dev.dma_mask = >dma_mask; + arch_setup_pdev_archdata(pdev); AFAICS m68k's implementation of that arch hook becomes entirely redundant after this change, so may as well go. That would just leave powerpc's actual archdata, which at a glance looks like it could probably be cleaned up with not *too* much trouble. Robin. +}; + /** * platform_device_put - destroy a platform device * @pdev: platform device to free @@ -310,7 +321,7 @@ struct platform_device *platform_device_alloc(const char *name, int id) pa->pdev.id = id; device_initialize(>pdev.dev); pa->pdev.dev.release = platform_device_release; - arch_setup_pdev_archdata(>pdev); + setup_pdev_archdata(>pdev); } return pa ? >pdev : NULL; @@ -512,7 +523,7 @@ EXPORT_SYMBOL_GPL(platform_device_del); int platform_device_register(struct platform_device *pdev) { device_initialize(>dev); - arch_setup_pdev_archdata(pdev); + setup_pdev_archdata(pdev); return platform_device_add(pdev); } EXPORT_SYMBOL_GPL(platform_device_register); diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 9bc36b589827..a2abde2aef25 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -24,6 +24,7 @@ struct platform_device { int id; boolid_auto; struct device dev; + u64 dma_mask; u32 num_resources; struct resource *resource; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?
Hello Since lot of release (at least since 4.19), I hit the following error message: DMA-API: cacheline tracking ENOMEM, dma-debug disabled After hitting that, I try to check who is creating so many DMA mapping and see: cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c 6 ahci 257 e1000e 6 ehci-pci 5891 nouveau 24 uhci_hcd Does nouveau having this high number of DMA mapping is normal ? Regards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v9 08/21] iommu/io-pgtable-arm-v7s: Extend MediaTek 4GB Mode
Hi Yong Wu, Sorry, but I'm still deeply confused by this patch. On Sat, Aug 10, 2019 at 03:58:08PM +0800, Yong Wu wrote: > MediaTek extend the arm v7s descriptor to support the dram over 4GB. > > In the mt2712 and mt8173, it's called "4GB mode", the physical address > is from 0x4000_ to 0x1_3fff_, but from EMI point of view, it > is remapped to high address from 0x1__ to 0x1__, the > bit32 is always enabled. thus, in the M4U, we always enable the bit9 > for all PTEs which means to enable bit32 of physical address. Here is > the detailed remap relationship in the "4GB mode": > CPU PA ->HW PA > 0x4000_ 0x1_4000_ (Add bit32) > 0x8000_ 0x1_8000_ ... > 0xc000_ 0x1_c000_ ... > 0x1__0x1__ (No change) So in this example, there are no PAs below 0x4000_ yet you later add code to deal with that: > + /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/ > + if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) > + paddr |= BIT_ULL(32); Why? Mainline currently doesn't do anything like this for the "4gb mode" support as far as I can tell. In fact, we currently unconditionally set bit 32 in the physical address returned by iova_to_phys() which wouldn't match your CPU PAs listed above, so I'm confused about how this is supposed to work. The way I would like this quirk to work is that the io-pgtable code basically sets bit 9 in the pte when bit 32 is set in the physical address, and sets bit 4 in the pte when bit 33 is set in the physical address. It would then do the opposite when converting a pte to a physical address. That way, your driver can call the page table code directly with the high addresses and we don't have to do any manual offsetting or range checking in the page table code. Please can you explain to me why the diff below doesn't work on top of this series? I'm happy to chat on IRC if you think it would be easier, because I have a horrible feeling that we've been talking past each other and I'd like to see this support merged for 5.4. Will --->8 diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index ab12ef5f8b03..d8d84617c822 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -184,7 +184,7 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl, arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl); if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) { - if ((paddr & BIT_ULL(32)) || cfg->oas == ARM_V7S_MTK_4GB_OAS) + if (paddr & BIT_ULL(32)) pte |= ARM_V7S_ATTR_MTK_PA_BIT32; if (paddr & BIT_ULL(33)) pte |= ARM_V7S_ATTR_MTK_PA_BIT33; @@ -206,17 +206,14 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl, mask = ARM_V7S_LVL_MASK(lvl); paddr = pte & mask; - if (cfg->oas == 32 || !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT)) - return paddr; - if (pte & ARM_V7S_ATTR_MTK_PA_BIT33) - paddr |= BIT_ULL(33); + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT) { + if (pte & ARM_V7S_ATTR_MTK_PA_BIT32) + paddr |= BIT_ULL(32); + if (pte & ARM_V7S_ATTR_MTK_PA_BIT33) + paddr |= BIT_ULL(33); + } - /* Workaround for MTK 4GB Mode: Add BIT32 only when PA < 0x4000_.*/ - if (cfg->oas == ARM_V7S_MTK_4GB_OAS && paddr < 0x4000UL) - paddr |= BIT_ULL(32); - else if (pte & ARM_V7S_ATTR_MTK_PA_BIT32) - paddr |= BIT_ULL(32); return paddr; } diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d5b9454352fd..3ae54dedede0 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -286,7 +286,7 @@ static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom) if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT)) dom->cfg.oas = 32; else if (data->enable_4GB) - dom->cfg.oas = ARM_V7S_MTK_4GB_OAS; + dom->cfg.oas = 33; else dom->cfg.oas = 34; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 27337395bd42..a2a52c349fe4 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -113,8 +113,6 @@ struct io_pgtable_cfg { }; }; -#define ARM_V7S_MTK_4GB_OAS33 - /** * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers. *
[GIT PULL] dma mapping fixes for 5.3-rc
The following changes since commit 451577f3e3a9bf1861218641dbbf98e214e77851: Merge tag 'kbuild-fixes-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild (2019-08-09 20:31:04 -0700) are available in the Git repository at: git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-4 for you to fetch changes up to 33dcb37cef741294b481f4d889a465b8091f11bf: dma-mapping: fix page attributes for dma_mmap_* (2019-08-10 19:52:45 +0200) dma-mapping fixes for 5.3-rc - fix the handling of the bus_dma_mask in dma_get_required_mask, which caused a regression in this merge window (Lucas Stach) - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me) - fix dma_mmap_coherent to not cause page attribute mismatches on coherent architectures like x86 (me) Christoph Hellwig (2): dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING dma-mapping: fix page attributes for dma_mmap_* Lucas Stach (1): dma-direct: don't truncate dma_required_mask to bus addressing capabilities arch/arm/mm/dma-mapping.c| 4 +--- arch/arm64/mm/dma-mapping.c | 4 +--- arch/powerpc/Kconfig | 1 - arch/powerpc/kernel/Makefile | 3 +-- arch/powerpc/kernel/dma-common.c | 17 - drivers/iommu/dma-iommu.c| 6 +++--- include/linux/dma-noncoherent.h | 13 + kernel/dma/direct.c | 10 +- kernel/dma/mapping.c | 19 ++- kernel/dma/remap.c | 2 +- 10 files changed, 39 insertions(+), 40 deletions(-) delete mode 100644 arch/powerpc/kernel/dma-common.c ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[git pull] IOMMU Fixes for Linux v5.3-rc4
Hi Linus, The following changes since commit e21a712a9685488f5ce80495b37b9fdbe96c230d: Linux 5.3-rc3 (2019-08-04 18:40:12 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tags/iommu-fixes-v5.3-rc4 for you to fetch changes up to 3a18844dcf89e636b2d0cbf577e3963b0bcb6d23: iommu/vt-d: Fix possible use-after-free of private domain (2019-08-09 17:35:25 +0200) IOMMU Fixes for Linux v5.3-rc4 Including: - A couple more fixes for the Intel VT-d driver for bugs introduced during the recent conversion of this driver to use IOMMU core default domains. - Fix for common dma-iommu code to make sure MSI mappings happen in the correct domain for a device. - Fix a corner case in the handling of sg-lists in dma-iommu code that might cause dma_length to be truncated. - Mark a switch as fall-through in arm-smmu code. Anders Roxell (1): iommu/arm-smmu: Mark expected switch fall-through Lu Baolu (4): iommu/vt-d: Detach domain when move device out of group iommu/vt-d: Correctly check format of page table in debugfs iommu/vt-d: Detach domain before using a private one iommu/vt-d: Fix possible use-after-free of private domain Robin Murphy (2): iommu/dma: Handle MSI mappings separately iommu/dma: Handle SG length overflow better drivers/iommu/arm-smmu-v3.c | 4 ++-- drivers/iommu/dma-iommu.c | 19 +++ drivers/iommu/intel-iommu-debugfs.c | 2 +- drivers/iommu/intel-iommu.c | 11 +-- 4 files changed, 23 insertions(+), 13 deletions(-) Please pull. Thanks, Joerg
[PATCH 2/2] microblaze: use the generic dma coherent remap allocator
This switches to using common code for the DMA allocations, including potential use of the CMA allocator if configured. Switching to the generic code enables DMA allocations from atomic context, which is required by the DMA API documentation, and also adds various other minor features drivers start relying upon. It also makes sure we have on tested code base for all architectures that require uncached pte bits for coherent DMA allocations. Signed-off-by: Christoph Hellwig --- arch/microblaze/Kconfig | 1 + arch/microblaze/mm/consistent.c | 152 +--- 2 files changed, 5 insertions(+), 148 deletions(-) diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index a0d749c309f3..e477896fbae6 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -17,6 +17,7 @@ config MICROBLAZE select TIMER_OF select CLONE_BACKWARDS3 select COMMON_CLK + select DMA_DIRECT_REMAP if MMU select GENERIC_ATOMIC64 select GENERIC_CLOCKEVENTS select GENERIC_CPU_DEVICES diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c index 1a859e8b58c2..0e0f733eb846 100644 --- a/arch/microblaze/mm/consistent.c +++ b/arch/microblaze/mm/consistent.c @@ -4,43 +4,16 @@ * Copyright (C) 2010 Michal Simek * Copyright (C) 2010 PetaLogix * Copyright (C) 2005 John Williams - * - * Based on PowerPC version derived from arch/arm/mm/consistent.c - * Copyright (C) 2001 Dan Malek (dma...@jlc.net) - * Copyright (C) 2000 Russell King */ -#include -#include -#include #include -#include #include #include -#include -#include #include -#include -#include -#include #include -#include -#include -#include -#include -#include -#include #include - -#include -#include -#include -#include -#include -#include -#include #include -#include +#include void arch_dma_prep_coherent(struct page *page, size_t size) { @@ -84,126 +57,9 @@ void *cached_kernel_address(void *ptr) return (void *)(addr & ~UNCACHED_SHADOW_MASK); } #else /* CONFIG_MMU */ -void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t gfp, unsigned long attrs) -{ - unsigned long order, vaddr; - void *ret; - unsigned int i, err = 0; - struct page *page, *end; - phys_addr_t pa; - struct vm_struct *area; - unsigned long va; - - if (in_interrupt()) - BUG(); - - /* Only allocate page size areas. */ - size = PAGE_ALIGN(size); - order = get_order(size); - - vaddr = __get_free_pages(gfp | __GFP_ZERO, order); - if (!vaddr) - return NULL; - - /* -* we need to ensure that there are no cachelines in use, -* or worse dirty in this area. -*/ - arch_dma_prep_coherent(virt_to_page((unsigned long)vaddr), size); - - /* Allocate some common virtual space to map the new pages. */ - area = get_vm_area(size, VM_ALLOC); - if (!area) { - free_pages(vaddr, order); - return NULL; - } - va = (unsigned long) area->addr; - ret = (void *)va; - - /* This gives us the real physical address of the first page. */ - *dma_handle = pa = __virt_to_phys(vaddr); - - /* -* free wasted pages. We skip the first page since we know -* that it will have count = 1 and won't require freeing. -* We also mark the pages in use as reserved so that -* remap_page_range works. -*/ - page = virt_to_page(vaddr); - end = page + (1 << order); - - split_page(page, order); - - for (i = 0; i < size && err == 0; i += PAGE_SIZE) { - /* MS: This is the whole magic - use cache inhibit pages */ - err = map_page(va + i, pa + i, _PAGE_KERNEL | _PAGE_NO_CACHE); - - SetPageReserved(page); - page++; - } - - /* Free the otherwise unused pages. */ - while (page < end) { - __free_page(page); - page++; - } - - if (err) { - free_pages(vaddr, order); - return NULL; - } - - return ret; -} - -static pte_t *consistent_virt_to_pte(void *vaddr) -{ - unsigned long addr = (unsigned long)vaddr; - - return pte_offset_kernel(pmd_offset(pgd_offset_k(addr), addr), addr); -} - -long arch_dma_coherent_to_pfn(struct device *dev, void *vaddr, - dma_addr_t dma_addr) +static int __init atomic_pool_init(void) { - pte_t *ptep = consistent_virt_to_pte(vaddr); - - if (pte_none(*ptep) || !pte_present(*ptep)) - return 0; - - return pte_pfn(*ptep); -} - -/* - * free page(s) as defined by the above mapping. - */ -void arch_dma_free(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_addr, unsigned long attrs) -{ - struct page *page; - - if
[PATCH 1/2] microblaze/nommu: use the generic uncached segment support
Stop providing our own arch alloc/free hooks for nommu platforms and just expose the segment offset and use the generic dma-direct allocator. Signed-off-by: Christoph Hellwig --- arch/microblaze/Kconfig | 2 + arch/microblaze/mm/consistent.c | 93 +++-- 2 files changed, 43 insertions(+), 52 deletions(-) diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index d411de05b628..a0d749c309f3 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -5,9 +5,11 @@ config MICROBLAZE select ARCH_NO_SWAP select ARCH_HAS_BINFMT_FLAT if !MMU select ARCH_HAS_DMA_COHERENT_TO_PFN if MMU + select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_UNCACHED_SEGMENT if !MMU select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_NO_COHERENT_DMA_MMAP if !MMU select ARCH_WANT_IPC_PARSE_VERSION diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c index bc7042209c57..1a859e8b58c2 100644 --- a/arch/microblaze/mm/consistent.c +++ b/arch/microblaze/mm/consistent.c @@ -42,21 +42,48 @@ #include #include -#ifndef CONFIG_MMU -/* I have to use dcache values because I can't relate on ram size */ -# define UNCACHED_SHADOW_MASK (cpuinfo.dcache_high - cpuinfo.dcache_base + 1) -#endif +void arch_dma_prep_coherent(struct page *page, size_t size) +{ + phys_addr_t paddr = page_to_phys(page); + + flush_dcache_range(paddr, paddr + size); +} +#ifndef CONFIG_MMU /* - * Consistent memory allocators. Used for DMA devices that want to - * share uncached memory with the processor core. - * My crufty no-MMU approach is simple. In the HW platform we can optionally - * mirror the DDR up above the processor cacheable region. So, memory accessed - * in this mirror region will not be cached. It's alloced from the same - * pool as normal memory, but the handle we return is shifted up into the - * uncached region. This will no doubt cause big problems if memory allocated - * here is not also freed properly. -- JW + * Consistent memory allocators. Used for DMA devices that want to share + * uncached memory with the processor core. My crufty no-MMU approach is + * simple. In the HW platform we can optionally mirror the DDR up above the + * processor cacheable region. So, memory accessed in this mirror region will + * not be cached. It's alloced from the same pool as normal memory, but the + * handle we return is shifted up into the uncached region. This will no doubt + * cause big problems if memory allocated here is not also freed properly. -- JW + * + * I have to use dcache values because I can't relate on ram size: */ +#ifdef CONFIG_XILINX_UNCACHED_SHADOW +#define UNCACHED_SHADOW_MASK (cpuinfo.dcache_high - cpuinfo.dcache_base + 1) +#else +#define UNCACHED_SHADOW_MASK 0 +#endif /* CONFIG_XILINX_UNCACHED_SHADOW */ + +void *uncached_kernel_address(void *ptr) +{ + unsigned long addr = (unsigned long)ptr; + + addr |= UNCACHED_SHADOW_MASK; + if (addr > cpuinfo.dcache_base && addr < cpuinfo.dcache_high) + pr_warn("ERROR: Your cache coherent area is CACHED!!!\n"); + return (void *)addr; +} + +void *cached_kernel_address(void *ptr) +{ + unsigned long addr = (unsigned long)ptr; + + return (void *)(addr & ~UNCACHED_SHADOW_MASK); +} +#else /* CONFIG_MMU */ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { @@ -64,12 +91,9 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, void *ret; unsigned int i, err = 0; struct page *page, *end; - -#ifdef CONFIG_MMU phys_addr_t pa; struct vm_struct *area; unsigned long va; -#endif if (in_interrupt()) BUG(); @@ -86,26 +110,8 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, * we need to ensure that there are no cachelines in use, * or worse dirty in this area. */ - flush_dcache_range(virt_to_phys((void *)vaddr), - virt_to_phys((void *)vaddr) + size); - -#ifndef CONFIG_MMU - ret = (void *)vaddr; - /* -* Here's the magic! Note if the uncached shadow is not implemented, -* it's up to the calling code to also test that condition and make -* other arranegments, such as manually flushing the cache and so on. -*/ -# ifdef CONFIG_XILINX_UNCACHED_SHADOW - ret = (void *)((unsigned) ret | UNCACHED_SHADOW_MASK); -# endif - if ((unsigned int)ret > cpuinfo.dcache_base && - (unsigned int)ret < cpuinfo.dcache_high) - pr_warn("ERROR: Your cache coherent area is CACHED!!!\n"); + arch_dma_prep_coherent(virt_to_page((unsigned
convert microblaze to the generic dma remap allocator
Hi Michal, can you take a look at this patch that moves microblaze over to the generic DMA remap allocator? I've been trying to slowly get all architectures over to the generic code, and microblaze is one that seems very straightfoward to convert. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] sh: use the generic dma coherent remap allocator
This switches to using common code for the DMA allocations, including potential use of the CMA allocator if configured. Switching to the generic code enables DMA allocations from atomic context, which is required by the DMA API documentation, and also adds various other minor features drivers start relying upon. It also makes sure we have on tested code base for all architectures that require uncached pte bits for coherent DMA allocations. Signed-off-by: Christoph Hellwig --- arch/sh/Kconfig | 2 ++ arch/sh/kernel/dma-coherent.c | 57 +-- 2 files changed, 10 insertions(+), 49 deletions(-) diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 6b1b5941b618..21eefe7c4ba6 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -158,7 +158,9 @@ config DMA_COHERENT config DMA_NONCOHERENT def_bool !DMA_COHERENT + select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select DMA_DIRECT_REMAP config PGTABLE_LEVELS default 3 if X2TLB diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c index b17514619b7e..2f0e2f2d1f9c 100644 --- a/arch/sh/kernel/dma-coherent.c +++ b/arch/sh/kernel/dma-coherent.c @@ -3,60 +3,13 @@ * Copyright (C) 2004 - 2007 Paul Mundt */ #include -#include #include -#include #include #include -void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t gfp, unsigned long attrs) +void arch_dma_prep_coherent(struct page *page, size_t size) { - void *ret, *ret_nocache; - int order = get_order(size); - - gfp |= __GFP_ZERO; - - ret = (void *)__get_free_pages(gfp, order); - if (!ret) - return NULL; - - /* -* Pages from the page allocator may have data present in -* cache. So flush the cache before using uncached memory. -*/ - arch_sync_dma_for_device(dev, virt_to_phys(ret), size, - DMA_BIDIRECTIONAL); - - ret_nocache = (void __force *)ioremap_nocache(virt_to_phys(ret), size); - if (!ret_nocache) { - free_pages((unsigned long)ret, order); - return NULL; - } - - split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order); - - *dma_handle = virt_to_phys(ret); - if (!WARN_ON(!dev)) - *dma_handle -= PFN_PHYS(dev->dma_pfn_offset); - - return ret_nocache; -} - -void arch_dma_free(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_handle, unsigned long attrs) -{ - int order = get_order(size); - unsigned long pfn = (dma_handle >> PAGE_SHIFT); - int k; - - if (!WARN_ON(!dev)) - pfn += dev->dma_pfn_offset; - - for (k = 0; k < (1 << order); k++) - __free_pages(pfn_to_page(pfn + k), 0); - - iounmap(vaddr); + __flush_purge_region(page_address(page), size); } void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr, @@ -78,3 +31,9 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr, BUG(); } } + +static int __init atomic_pool_init(void) +{ + return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL)); +} +postcore_initcall(atomic_pool_init); -- 2.20.1
convert sh to the generic dma remap allocator
Hi Rich and Yoshinori, can you take a quick look at this patch that moves sh over to the generic DMA remap allocator? I've been trying to slowly get all architectures over to the generic code, and I'd like merge this one for 5.4 now that it has been posted a handful of times without any feedback.
[PATCH 05/10] ia64: Get rid of iommu_pass_through
From: Joerg Roedel This variable has no users anymore so it can be removed. Signed-off-by: Joerg Roedel --- arch/ia64/include/asm/iommu.h | 2 -- arch/ia64/kernel/pci-dma.c| 2 -- 2 files changed, 4 deletions(-) diff --git a/arch/ia64/include/asm/iommu.h b/arch/ia64/include/asm/iommu.h index 7429a72f3f92..92aceef63710 100644 --- a/arch/ia64/include/asm/iommu.h +++ b/arch/ia64/include/asm/iommu.h @@ -8,10 +8,8 @@ extern void no_iommu_init(void); #ifdef CONFIG_INTEL_IOMMU extern int force_iommu, no_iommu; -extern int iommu_pass_through; extern int iommu_detected; #else -#define iommu_pass_through (0) #define no_iommu (1) #define iommu_detected (0) #endif diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c index fe988c49f01c..f5d49cd3fbb0 100644 --- a/arch/ia64/kernel/pci-dma.c +++ b/arch/ia64/kernel/pci-dma.c @@ -22,8 +22,6 @@ int force_iommu __read_mostly = 1; int force_iommu __read_mostly; #endif -int iommu_pass_through; - static int __init pci_iommu_init(void) { if (iommu_detected) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 09/10] iommu: Disable passthrough mode when SME is active
From: Joerg Roedel Using Passthrough mode when SME is active causes certain devices to use the SWIOTLB bounce buffer. The bounce buffer code has an upper limit of 256kb for the size of DMA allocations, which is too small for certain devices and causes them to fail. With this patch we enable IOMMU by default when SME is active in the system, making the default configuration work for more systems than it does now. Users that don't want IOMMUs to be enabled still can disable them with kernel parameters. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 96cc7cc8ab21..4bc9025a9975 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -119,6 +119,11 @@ static int __init iommu_subsys_init(void) iommu_set_default_passthrough(); else iommu_set_default_translated(); + + if (iommu_default_passthrough() && sme_active()) { + pr_info("SME detected - Disabling default IOMMU Passthrough\n"); + iommu_set_default_translated(); + } } pr_info("Default domain type: %s %s\n", -- 2.17.1
[PATCH 04/10] x86/dma: Get rid of iommu_pass_through
From: Joerg Roedel This variable has no users anymore. Remove it and tell the IOMMU code via its new functions about requested DMA modes. Signed-off-by: Joerg Roedel --- arch/x86/include/asm/iommu.h | 1 - arch/x86/kernel/pci-dma.c| 11 +++ 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index baedab8ac538..b91623d521d9 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -4,7 +4,6 @@ extern int force_iommu, no_iommu; extern int iommu_detected; -extern int iommu_pass_through; /* 10 seconds */ #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index f62b498b18fb..a6fd479d4a71 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -43,12 +44,6 @@ int iommu_detected __read_mostly = 0; * It is also possible to disable by default in kernel config, and enable with * iommu=nopt at boot time. */ -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH -int iommu_pass_through __read_mostly = 1; -#else -int iommu_pass_through __read_mostly; -#endif - extern struct iommu_table_entry __iommu_table[], __iommu_table_end[]; void __init pci_iommu_alloc(void) @@ -120,9 +115,9 @@ static __init int iommu_setup(char *p) swiotlb = 1; #endif if (!strncmp(p, "pt", 2)) - iommu_pass_through = 1; + iommu_set_default_passthrough(); if (!strncmp(p, "nopt", 4)) - iommu_pass_through = 0; + iommu_set_default_translated(); gart_parse_options(p); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 02/10] iommu/amd: Request passthrough mode from IOMMU core
From: Joerg Roedel Get rid of the iommu_pass_through variable and request passthrough mode via the new iommu core function. Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index b607a92791d3..7434b34d7a94 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -436,7 +436,7 @@ static int iommu_init_device(struct device *dev) * invalid address), we ignore the capability for the device so * it'll be forced to go into translation mode. */ - if ((iommu_pass_through || !amd_iommu_force_isolation) && + if ((iommu_default_passthrough() || !amd_iommu_force_isolation) && dev_is_pci(dev) && pci_iommuv2_capable(to_pci_dev(dev))) { struct amd_iommu *iommu; @@ -2226,7 +2226,7 @@ static int amd_iommu_add_device(struct device *dev) BUG_ON(!dev_data); - if (iommu_pass_through || dev_data->iommu_v2) + if (dev_data->iommu_v2) iommu_request_dm_for_dev(dev); /* Domains are initialized for this device - have a look what we ended up with */ @@ -2805,7 +2805,7 @@ int __init amd_iommu_init_api(void) int __init amd_iommu_init_dma_ops(void) { - swiotlb= (iommu_pass_through || sme_me_mask) ? 1 : 0; + swiotlb= (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; iommu_detected = 1; if (amd_iommu_unmap_flush) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 00/10 v2] Cleanup IOMMU passthrough setting (and disable IOMMU Passthrough when SME is active)
Hi, This patch-set started out small to overwrite the default passthrough setting (through CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y) when SME is active. But on the way to that Tom reminded me that the current ways to configure passthrough/no-passthrough modes for IOMMU on x86 is a mess. So I added a few more patches to clean that up a bit, getting rid of the iommu_pass_through variable on the way.This information is now kept only in iommu code, with helpers to change that setting from architecture code. And of course this patch-set still disables IOMMU Passthrough mode when SME is active even when CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y is set. The reason for that change is that SME with passthrough mode turned out to be fragile with devices requiring SWIOTLB, mainly because SWIOTLB has a maximum allocation size of 256kb and a limit overall size of the bounce buffer. Therefore having IOMMU in translation mode by default is better when SME is active on a system. Please review. Thanks, Joerg Changes since v1: - Cleaned up the kernel command line parameters to configure passthrough/translated mode, getting rid of the global iommu_pass_through variable Joerg Roedel (10): iommu: Add helpers to set/get default domain type iommu/amd: Request passthrough mode from IOMMU core iommu/vt-d: Request passthrough mode from IOMMU core x86/dma: Get rid of iommu_pass_through ia64: Get rid of iommu_pass_through iommu: Remember when default domain type was set on kernel command line iommu: Print default domain type on boot iommu: Set default domain type at runtime iommu: Disable passthrough mode when SME is active Documentation: Update Documentation for iommu.passthrough .../admin-guide/kernel-parameters.txt | 2 +- arch/ia64/include/asm/iommu.h | 2 - arch/ia64/kernel/pci-dma.c| 2 - arch/x86/include/asm/iommu.h | 1 - arch/x86/kernel/pci-dma.c | 11 +-- drivers/iommu/amd_iommu.c | 6 +- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iommu.c | 83 +-- include/linux/iommu.h | 16 9 files changed, 101 insertions(+), 24 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 03/10] iommu/vt-d: Request passthrough mode from IOMMU core
From: Joerg Roedel Get rid of the iommu_pass_through variable and request passthrough mode via the new iommu core function. Signed-off-by: Joerg Roedel --- drivers/iommu/intel-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bdaed2da8a55..234bc2b55c59 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3267,7 +3267,7 @@ static int __init init_dmars(void) iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); } - if (iommu_pass_through) + if (iommu_default_passthrough()) iommu_identity_mapping |= IDENTMAP_ALL; #ifdef CONFIG_INTEL_IOMMU_BROKEN_GFX_WA -- 2.17.1
[PATCH 07/10] iommu: Print default domain type on boot
From: Joerg Roedel Introduce a subsys_initcall for IOMMU code and use it to print the default domain type at boot. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index e1feb4061b8b..233bc22b487e 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -93,12 +93,40 @@ struct iommu_group_attribute iommu_group_attr_##_name = \ static LIST_HEAD(iommu_device_list); static DEFINE_SPINLOCK(iommu_device_lock); +/* + * Use a function instead of an array here because the domain-type is a + * bit-field, so an array would waste memory. + */ +static const char *iommu_domain_type_str(unsigned int t) +{ + switch (t) { + case IOMMU_DOMAIN_BLOCKED: + return "Blocked"; + case IOMMU_DOMAIN_IDENTITY: + return "Passthrough"; + case IOMMU_DOMAIN_UNMANAGED: + return "Unmanaged"; + case IOMMU_DOMAIN_DMA: + return "Translated"; + default: + return "Unknown"; + } +} + +static int __init iommu_subsys_init(void) +{ + pr_info("Default domain type: %s\n", + iommu_domain_type_str(iommu_def_domain_type)); + + return 0; +} +subsys_initcall(iommu_subsys_init); + int iommu_device_register(struct iommu_device *iommu) { spin_lock(_device_lock); list_add_tail(>list, _device_list); spin_unlock(_device_lock); - return 0; } -- 2.17.1
[PATCH 01/10] iommu: Add helpers to set/get default domain type
From: Joerg Roedel Add a couple of functions to allow changing the default domain type from architecture code and a function for iommu drivers to request whether the default domain is passthrough. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 16 include/linux/iommu.h | 16 2 files changed, 32 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0c674d80c37f..f187e85a074b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2196,6 +2196,22 @@ int iommu_request_dma_domain_for_dev(struct device *dev) return request_default_domain_for_dev(dev, IOMMU_DOMAIN_DMA); } +void iommu_set_default_passthrough(void) +{ + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; +} + +void iommu_set_default_translated(void) +{ + iommu_def_domain_type = IOMMU_DOMAIN_DMA; +} + +bool iommu_default_passthrough(void) +{ + return iommu_def_domain_type == IOMMU_DOMAIN_IDENTITY; +} +EXPORT_SYMBOL_GPL(iommu_default_passthrough); + const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) { const struct iommu_ops *ops = NULL; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fdc355ccc570..58c3e3e5f157 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -413,6 +413,9 @@ extern void iommu_get_resv_regions(struct device *dev, struct list_head *list); extern void iommu_put_resv_regions(struct device *dev, struct list_head *list); extern int iommu_request_dm_for_dev(struct device *dev); extern int iommu_request_dma_domain_for_dev(struct device *dev); +extern void iommu_set_default_passthrough(void); +extern void iommu_set_default_translated(void); +extern bool iommu_default_passthrough(void); extern struct iommu_resv_region * iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot, enum iommu_resv_type type); @@ -694,6 +697,19 @@ static inline int iommu_request_dma_domain_for_dev(struct device *dev) return -ENODEV; } +static inline void iommu_set_default_passthrough(void) +{ +} + +static inline void iommu_set_default_translated(void) +{ +} + +static inline bool iommu_default_passthrough(void) +{ + return true; +} + static inline int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group) { -- 2.17.1
[PATCH 10/10] Documentation: Update Documentation for iommu.passthrough
From: Joerg Roedel This kernel parameter now takes also effect on X86. Signed-off-by: Joerg Roedel --- Documentation/admin-guide/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 47d981a86e2f..2d5dfa46e88a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1811,7 +1811,7 @@ synchronously. iommu.passthrough= - [ARM64] Configure DMA to bypass the IOMMU by default. + [ARM64, X86] Configure DMA to bypass the IOMMU by default. Format: { "0" | "1" } 0 - Use IOMMU translation for DMA. 1 - Bypass the IOMMU for DMA. -- 2.17.1
[PATCH 08/10] iommu: Set default domain type at runtime
From: Joerg Roedel Set the default domain-type at runtime, not at compile-time. This keeps default domain type setting in one place when we have to change it at runtime. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 233bc22b487e..96cc7cc8ab21 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -26,11 +26,8 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH -static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; -#else -static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; -#endif + +static unsigned int iommu_def_domain_type __read_mostly; static bool iommu_dma_strict __read_mostly = true; static u32 iommu_cmd_line __read_mostly; @@ -76,7 +73,7 @@ static void iommu_set_cmd_line_dma_api(void) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; } -static bool __maybe_unused iommu_cmd_line_dma_api(void) +static bool iommu_cmd_line_dma_api(void) { return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API); } @@ -115,8 +112,18 @@ static const char *iommu_domain_type_str(unsigned int t) static int __init iommu_subsys_init(void) { - pr_info("Default domain type: %s\n", - iommu_domain_type_str(iommu_def_domain_type)); + bool cmd_line = iommu_cmd_line_dma_api(); + + if (!cmd_line) { + if (IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH)) + iommu_set_default_passthrough(); + else + iommu_set_default_translated(); + } + + pr_info("Default domain type: %s %s\n", + iommu_domain_type_str(iommu_def_domain_type), + cmd_line ? "(set via kernel command line)" : ""); return 0; } -- 2.17.1
[PATCH 06/10] iommu: Remember when default domain type was set on kernel command line
From: Joerg Roedel Introduce an extensible concept to remember when certain configuration settings for the IOMMU code have been set on the kernel command line. This will be used later to prevent overwriting these settings with other defaults. Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f187e85a074b..e1feb4061b8b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -32,6 +32,7 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; #endif static bool iommu_dma_strict __read_mostly = true; +static u32 iommu_cmd_line __read_mostly; struct iommu_group { struct kobject kobj; @@ -68,6 +69,18 @@ static const char * const iommu_group_resv_type_string[] = { [IOMMU_RESV_SW_MSI] = "msi", }; +#define IOMMU_CMD_LINE_DMA_API (1 << 0) + +static void iommu_set_cmd_line_dma_api(void) +{ + iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; +} + +static bool __maybe_unused iommu_cmd_line_dma_api(void) +{ + return !!(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API); +} + #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ struct iommu_group_attribute iommu_group_attr_##_name =\ __ATTR(_name, _mode, _show, _store) @@ -165,6 +178,8 @@ static int __init iommu_set_def_domain_type(char *str) if (ret) return ret; + iommu_set_cmd_line_dma_api(); + iommu_def_domain_type = pt ? IOMMU_DOMAIN_IDENTITY : IOMMU_DOMAIN_DMA; return 0; } -- 2.17.1
Re: [PATCH] iommu/arm-smmu-v3: add nr_ats_masters to avoid unnecessary operations
Hi, I've been struggling with the memory ordering requirements here. More below. On Thu, Aug 01, 2019 at 08:20:40PM +0800, Zhen Lei wrote: > When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a > smmu domain does not contain any ats master, the operations of > arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() > are always executed. This will impact performance, especially in > multi-core and stress scenarios. For my FIO test scenario, about 8% > performance reduced. > > In fact, we can use a atomic member to record how many ats masters the > smmu contains. And check that without traverse the list and check all > masters one by one in the lock protection. > > Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") > Signed-off-by: Zhen Lei > --- > drivers/iommu/arm-smmu-v3.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index a9a9fabd396804a..1b370d9aca95f94 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -631,6 +631,7 @@ struct arm_smmu_domain { > > struct io_pgtable_ops *pgtbl_ops; > boolnon_strict; > + atomic_tnr_ats_masters; > > enum arm_smmu_domain_stage stage; > union { > @@ -1531,7 +1532,7 @@ static int arm_smmu_atc_inv_domain(struct > arm_smmu_domain *smmu_domain, > struct arm_smmu_cmdq_ent cmd; > struct arm_smmu_master *master; > > - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) > + if (!atomic_read(_domain->nr_ats_masters)) > return 0; This feels wrong to me: the CPU can speculate ahead of time that 'nr_ats_masters' is 0, but we could have a concurrent call to '->attach()' for an ATS-enabled device. Wouldn't it then be possible for the new device to populate its ATC as a result of speculative accesses for the mapping that we're tearing down? The devices lock solves this problem by serialising invalidation with '->attach()/->detach()' operations. John's suggestion of RCU might work better, but I think you'll need to call synchronize_rcu() between adding yourself to the 'devices' list and enabling ATS. What do you think? > arm_smmu_atc_inv_to_cmd(ssid, iova, size, ); > @@ -1869,6 +1870,7 @@ static int arm_smmu_enable_ats(struct arm_smmu_master > *master) > size_t stu; > struct pci_dev *pdev; > struct arm_smmu_device *smmu = master->smmu; > + struct arm_smmu_domain *smmu_domain = master->domain; > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev); > > if (!(smmu->features & ARM_SMMU_FEAT_ATS) || !dev_is_pci(master->dev) || > @@ -1887,12 +1889,15 @@ static int arm_smmu_enable_ats(struct arm_smmu_master > *master) > return ret; > > master->ats_enabled = true; > + atomic_inc(_domain->nr_ats_masters); Here, we need to make sure that concurrent invalidation sees the updated 'nr_ats_masters' value before ATS is enabled for the device, otherwise we could miss an ATC invalidation. I think the code above gets this guarantee because of the way that ATS is enabled in the STE, which ensures that we issue invalidation commands before making the STE 'live'; this has the side-effect of a write barrier before updating PROD, which I think we also rely on for installing the CD pointer. Put another way: writes are ordered before a subsequent command insertion. Do you agree? If so, I'll add a comment because this is subtle and easily overlooked. > static void arm_smmu_disable_ats(struct arm_smmu_master *master) > { > struct arm_smmu_cmdq_ent cmd; > + struct arm_smmu_domain *smmu_domain = master->domain; > > if (!master->ats_enabled || !dev_is_pci(master->dev)) > return; > @@ -1901,6 +1906,7 @@ static void arm_smmu_disable_ats(struct arm_smmu_master > *master) > arm_smmu_atc_inv_master(master, ); > pci_disable_ats(to_pci_dev(master->dev)); > master->ats_enabled = false; > + atomic_dec(_domain->nr_ats_masters); This part is the other way around: now we need to ensure that we don't decrement 'nr_ats_masters' until we've disabled ATS. This works for a number of reasons, none of which are obvious: - The control dependency from completing the prior CMD_SYNCs for tearing down the STE and invalidating the ATC - The spinlock handover from the CMD_SYNCs above - The writel() when poking PCI configuration space in pci_disable_ats() happens to be implemented with a write-write barrier I suppose the control dependency is the most compelling one: we can't let stores out whilst we're awaiting completion of a CMD_SYNC. Put another way: writes are ordered after the completion of a prior CMD_SYNC. But yeah, I need to write this down. > static void arm_smmu_detach_dev(struct arm_smmu_master *master) > @@ -1915,10 +1921,10 @@
Re: [PATCH] iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
On Tue, Aug 13, 2019 at 11:58:48AM +0800, Kai-Heng Feng wrote: > at 23:39, Joerg Roedel wrote: > > > On Thu, Aug 08, 2019 at 06:17:07PM +0800, Kai-Heng Feng wrote: > > > Raven Ridge systems may have malfunction touchpad or hang at boot if > > > incorrect IVRS IOAPIC is provided by BIOS. > > > > > > Users already found correct "ivrs_ioapic=" values, let's put them inside > > > kernel to workaround buggy BIOS. > > > > Will that still work when a fixed BIOS for these laptops is released? > > Do you mean that we should stop applying these quirks once a BIOS fix is > confirmed? My concern is just that these quirks break some systems that don't need them. > We can modify the quirk to compare BIOS version, if there’s an unlikely BIOS > update really fixes the issue. > Before that happens, I think it’s OK to let the quirks stay this way. A BIOS version check is not making things better here as it might lock out systems that need the quirk. I think we can leave it as it for now, but can you create a new file amd_iommu_quirks.c and move the code there. And in the struct and function names please make clear that it is about ivrs-quirks. Regards, Joerg
Re: [PATCH] iommu: exynos: Remove __init annotation from exynos_sysmmu_probe()
On Mon, Aug 12, 2019 at 12:32:46PM +0200, Marek Szyprowski wrote: > Exynos SYSMMU driver supports deferred probe. It happens when clocks > needed for this driver are not yet available. Typically next calls to > driver ->probe() happen before init section is free, but this is not > really guaranteed. To make if safe, remove __init annotation from > exynos_sysmmu_probe() function. > > Signed-off-by: Marek Szyprowski Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 5/8] iommu: Add bounce page APIs
Hi Lu Baolu, On Tue, Jul 30, 2019 at 12:52:26PM +0800, Lu Baolu wrote: > * iommu_bounce_map(dev, addr, paddr, size, dir, attrs) > - Map a buffer start at DMA address @addr in bounce page > manner. For buffer parts that doesn't cross a whole > minimal IOMMU page, the bounce page policy is applied. > A bounce page mapped by swiotlb will be used as the DMA > target in the IOMMU page table. Otherwise, the physical > address @paddr is mapped instead. > > * iommu_bounce_unmap(dev, addr, size, dir, attrs) > - Unmap the buffer mapped with iommu_bounce_map(). The bounce > page will be torn down after the bounced data get synced. > > * iommu_bounce_sync(dev, addr, size, dir, target) > - Synce the bounced data in case the bounce mapped buffer is > reused. I don't really get why this API extension is needed for your use-case. Can't this just be done using iommu_map/unmap operations? Can you please elaborate a bit why these functions are needed? Regards, Joerg
Re: [PATCH v9 00/21] MT8183 IOMMU SUPPORT
On Wed, Aug 14, 2019 at 10:18:25AM +0200, Joerg Roedel wrote: > On Sat, Aug 10, 2019 at 03:58:00PM +0800, Yong Wu wrote: > > Change notes: > > v9: > >1) rebase on v5.3-rc1. > >2) In v7s, Use oas to implement MTK 4GB mode. It nearly reconstruct the > > patch, so I don't keep the R-b. > > Okay, this looks close to being ready, just the io-pgtable patches still > need review. On my list for today :) (Today is SMMU day for me. Send coffee.) Will
Re: [PATCH v9 00/21] MT8183 IOMMU SUPPORT
On Sat, Aug 10, 2019 at 03:58:00PM +0800, Yong Wu wrote: > Change notes: > v9: >1) rebase on v5.3-rc1. >2) In v7s, Use oas to implement MTK 4GB mode. It nearly reconstruct the > patch, so I don't keep the R-b. Okay, this looks close to being ready, just the io-pgtable patches still need review. Regards, Joerg