Re: [PATCH] iommu/vt-d: call __dmar_remove_one_dev_info with valid pointer
On 1/22/20 8:34 AM, Jerry Snitselaar wrote: It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9. Cc: Joerg Roedel Cc: Lu Baolu Cc: David Woodhouse Cc: sta...@vger.kernel.org # 5.3+ Cc: linux-ker...@vger.kernel.org Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar Acked-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
Hi Robin, On 1/21/20 8:45 PM, Robin Murphy wrote: On 19/01/2020 6:29 am, Lu Baolu wrote: Hi Joerg, On 1/17/20 6:21 PM, Joerg Roedel wrote: On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote: This splits iommu group allocation from adding devices. This makes it possible to determine the default domain type for each group as all devices belonging to the group have been determined. I think its better to keep group allocation as it is and just defer default domain allocation after each device is in its group. But take I tried defering default domain allocation, but it seems not possible. The call path of adding devices into their groups: iommu_probe_device -> ops->add_device(dev) -> (iommu vendor driver) iommu_group_get_for_dev(dev) After doing this, the vendor driver will get the default domain and apply dma_ops according to the domain type. If we defer the domain allocation, they will get a NULL default domain and cause panic in the vendor driver. Any suggestions? https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0...@linux.intel.com/ Haven't we been here before? ;) Since we can't (safely or reasonably) change a group's default domain after ops->add_device() has returned, and in general it gets impractical to evaluate "all device in a group" once you look beyond &pci_bus_type (or consider hotplug as mentioned), then AFAICS there's no reasonable way to get away from the default domain type being defined by the first device to attach. Yes, agreed. But in practice it's hardly a problem anyway - if every device in a given group requests the same domain type then it doesn't matter which comes first, and if they don't then we ultimately end up with an impossible set of constraints, so are doomed to do the 'wrong' thing regardless. The third case is, for example, three devices A, B, C in a group. The first device A is neutral about which type of default domain type is used. So the iommu framework will use a static default domain. But the device B requires to use a specific one which is different from the default. Currently, this is handled in the vendor iommu driver and one motivation of this patch set is to handle this in the generic layer. Thus unless anyone wants to start redefining the whole group concept to separate the notions of ID aliasing and peer-to-peer isolation (which still wouldn't necessarily help), I think this user override effectively boils down to just another flavour of iommu_request_*_for_dev(), and as such comes right back to the "just pass the bloody device to ops->domain_alloc() and resolve everything up-front" argument. Robin. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/4] iommu: Per-group default domain type
Hi, On 1/21/20 6:14 PM, John Garry wrote: On 21/01/2020 00:43, Lu Baolu wrote: An IOMMU group represents the smallest set of devices that are considered to be isolated. All devices belonging to an IOMMU group share a default domain for DMA APIs. There are two types of default domain: IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the latter means IOMMU by-pass. Currently, the default domain type for the IOMMU groups is determined globally. All IOMMU groups use a single default domain type. The global default domain type can be adjusted by kernel build configuration or kernel parameters. More and more users are looking forward to a fine grained default domain type. For example, with the global default domain type set to translation, the OEM verndors or end users might want some trusted and fast-speed devices to bypass IOMMU for performance gains. On the other hand, with global default domain type set to by-pass, some devices with limited system memory addressing capability might want IOMMU translation to remove the bounce buffer overhead. Hi Lu Baolu, Do you think that it would be a more common usecase to want kernel-managed devices to be passthrough for performance reasons and some select devices to be in DMA domain, like those with limited address cap or whose drivers request huge amounts of memory? I just think it would be more manageable to set kernel commandline parameters for this, i.e. those select few which want DMA domain. Hi Baolu, It's just two sides of a coin. Currently, iommu subsystem make DMA domain by default, that's the reason why I selected to let user set which devices are willing to use identity domains. OK, understood. There was an alternate solution here which would allow per-group type to be updated via sysfs: https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prak...@intel.com/ Yes. My patch set just tries to do this statically during boot time. Any idea what happened to that? No idea. Sai might have more information. :-) Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
Hi Bjorn, On 1/21/20 10:17 PM, Bjorn Helgaas wrote: [+cc linux-pci, thread athttps://lore.kernel.org/r/20200101052648.14295-1-baolu...@linux.intel.com] On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote: The new parameter takes a list of devices separated by a semicolon. Each device specified will have its iommu_passthrough bit in struct device set. This is very similar to the existing 'disable_acs_redir' parameter. Almost all of this patchset is in drivers/iommu. Should the parameter be "iommu ..." instead of "pci=iommu_passthrough=..."? There is already an "iommu.passthrough=" argument. Would this fit better there? Since the iommu_passthrough bit is generic, it seems like you anticipate similar situations for non-PCI devices. Yes. Fair enough. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 5/7] iommu/vt-d: Remove VMD child device sanity check
On 1/21/20 9:37 PM, Jon Derrick wrote: This removes the sanity check required for VMD child devices. The new pci_real_dma_dev() DMA alias mechanism places them in the same IOMMU group as the VMD endpoint. Assignment of the group would require assigning the VMD endpoint, where unbinding the VMD endpoint removes the child device domain from the hierarchy. Signed-off-by: Jon Derrick Acked-by: Lu Baolu Thanks, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 4/7] iommu/vt-d: Use pci_real_dma_dev() for mapping
On 1/21/20 9:37 PM, Jon Derrick wrote: The PCI device may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. This case requires the real DMA device when mapping to IOMMU. Signed-off-by: Jon Derrick Acked-by: Lu Baolu Thanks, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.
On Thu, Dec 19, 2019 at 08:52:45PM -0500, Konrad Rzeszutek Wilk wrote: > On Mon, Dec 09, 2019 at 11:13:46PM +, Ashish Kalra wrote: > > From: Ashish Kalra > > > > For SEV, all DMA to and from guest has to use shared > > (un-encrypted) pages. SEV uses SWIOTLB to make this happen > > without requiring changes to device drivers. However, > > depending on workload being run, the default 64MB of SWIOTLB > > might not be enough and SWIOTLB may run out of buffers to > > use for DMA, resulting in I/O errors. > > > > Increase the default size of SWIOTLB for SEV guests using > > a minimum value of 128MB and a maximum value of 512MB, > > determining on amount of provisioned guest memory. > > > > The SWIOTLB default size adjustment is added as an > > architecture specific interface/callback to allow > > architectures such as those supporting memory encryption > > to adjust/expand SWIOTLB size for their use. > > What if this was made dynamic? That is if there is a memory > pressure you end up expanding the SWIOTLB dynamically? As of now we want to keep it as simple as possible and more like a stop-gap arrangement till something more elegant is available. > >> Also is it worth doing this calculation based on memory or >> more on the # of PCI devices + their MMIO ranges size? Additional memory calculations based on # of PCI devices and their memory ranges will make it more complicated with so many other permutations and combinations to explore, it is essential to keep this patch as simple as possible by adjusting the bounce buffer size simply by determining it from the amount of provisioned guest memory. Thanks, Ashish ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: call __dmar_remove_one_dev_info with valid pointer
It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9. Cc: Joerg Roedel Cc: Lu Baolu Cc: David Woodhouse Cc: sta...@vger.kernel.org # 5.3+ Cc: linux-ker...@vger.kernel.org Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 1801f0aaf013..932267f49f9a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev) spin_lock_irqsave(&device_domain_lock, flags); info = dev->archdata.iommu; - if (info) + if (info && info != DEFER_DEVICE_DOMAIN_INFO + && info != DUMMY_DEVICE_DOMAIN_INFO) __dmar_remove_one_dev_info(info); spin_unlock_irqrestore(&device_domain_lock, flags); } -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.
On Tue, Jan 21, 2020 at 08:09:47PM +, Ashish Kalra wrote: > On Thu, Dec 19, 2019 at 08:52:45PM -0500, Konrad Rzeszutek Wilk wrote: > > On Mon, Dec 09, 2019 at 11:13:46PM +, Ashish Kalra wrote: > > > From: Ashish Kalra > > > > > > For SEV, all DMA to and from guest has to use shared > > > (un-encrypted) pages. SEV uses SWIOTLB to make this happen > > > without requiring changes to device drivers. However, > > > depending on workload being run, the default 64MB of SWIOTLB > > > might not be enough and SWIOTLB may run out of buffers to > > > use for DMA, resulting in I/O errors. > > > > > > Increase the default size of SWIOTLB for SEV guests using > > > a minimum value of 128MB and a maximum value of 512MB, > > > determining on amount of provisioned guest memory. > > > > > > The SWIOTLB default size adjustment is added as an > > > architecture specific interface/callback to allow > > > architectures such as those supporting memory encryption > > > to adjust/expand SWIOTLB size for their use. > > > > What if this was made dynamic? That is if there is a memory > > pressure you end up expanding the SWIOTLB dynamically? > > As of now we want to keep it as simple as possible and more > like a stop-gap arrangement till something more elegant is > available. That is nice. But past experience has shown that stop-gap arrangments end up being the defacto solution. > > > > >> Also is it worth doing this calculation based on memory or > >> more on the # of PCI devices + their MMIO ranges size? > > Additional memory calculations based on # of PCI devices and > their memory ranges will make it more complicated with so > many other permutations and combinations to explore, it is > essential to keep this patch as simple as possible by > adjusting the bounce buffer size simply by determining it > from the amount of provisioned guest memory. Please rework the patch to: - Use a log solution instead of the multiplication. Feel free to cap it at a sensible value. - Also the code depends on SWIOTLB calling in to the adjust_swiotlb_default_size which looks wrong. You should not adjust io_tlb_nslabs from swiotlb_size_or_default. That function's purpose is to report a value. - Make io_tlb_nslabs be visible outside of the SWIOTLB code. - Can you utilize the IOMMU_INIT APIs and have your own detect which would modify the io_tlb_nslabs (and set swiotbl=1?). Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so perhaps add in this code ? Albeit it really should be in it's own file, not in arch/x86/kernel/pci-swiotlb.c - Tweak the code in the swiotlb code to make sure it can deal with io_tlb_nslabs being modified outside of the code at the start. It should have no trouble, but only testing will tell for sure. > > Thanks, > Ashish ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 7/7] x86/PCI: Remove X86_DEV_DMA_OPS
From: Christoph Hellwig There are no users of X86_DEV_DMA_OPS left, so remove the code. Reviewed-by: Jon Derrick Signed-off-by: Christoph Hellwig --- arch/x86/Kconfig | 3 --- arch/x86/include/asm/device.h | 10 -- arch/x86/pci/common.c | 38 -- 3 files changed, 51 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5e89499..77f9426 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2955,9 +2955,6 @@ config HAVE_ATOMIC_IOMAP def_bool y depends on X86_32 -config X86_DEV_DMA_OPS - bool - source "drivers/firmware/Kconfig" source "arch/x86/kvm/Kconfig" diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h index 5e12c63..7e31f7f 100644 --- a/arch/x86/include/asm/device.h +++ b/arch/x86/include/asm/device.h @@ -8,16 +8,6 @@ struct dev_archdata { #endif }; -#if defined(CONFIG_X86_DEV_DMA_OPS) && defined(CONFIG_PCI_DOMAINS) -struct dma_domain { - struct list_head node; - const struct dma_map_ops *dma_ops; - int domain_nr; -}; -void add_dma_domain(struct dma_domain *domain); -void del_dma_domain(struct dma_domain *domain); -#endif - struct pdev_archdata { }; diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index fe21a5c..df1d959 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -625,43 +625,6 @@ unsigned int pcibios_assign_all_busses(void) return (pci_probe & PCI_ASSIGN_ALL_BUSSES) ? 1 : 0; } -#if defined(CONFIG_X86_DEV_DMA_OPS) && defined(CONFIG_PCI_DOMAINS) -static LIST_HEAD(dma_domain_list); -static DEFINE_SPINLOCK(dma_domain_list_lock); - -void add_dma_domain(struct dma_domain *domain) -{ - spin_lock(&dma_domain_list_lock); - list_add(&domain->node, &dma_domain_list); - spin_unlock(&dma_domain_list_lock); -} -EXPORT_SYMBOL_GPL(add_dma_domain); - -void del_dma_domain(struct dma_domain *domain) -{ - spin_lock(&dma_domain_list_lock); - list_del(&domain->node); - spin_unlock(&dma_domain_list_lock); -} -EXPORT_SYMBOL_GPL(del_dma_domain); - -static void set_dma_domain_ops(struct pci_dev *pdev) -{ - struct dma_domain *domain; - - spin_lock(&dma_domain_list_lock); - list_for_each_entry(domain, &dma_domain_list, node) { - if (pci_domain_nr(pdev->bus) == domain->domain_nr) { - pdev->dev.dma_ops = domain->dma_ops; - break; - } - } - spin_unlock(&dma_domain_list_lock); -} -#else -static void set_dma_domain_ops(struct pci_dev *pdev) {} -#endif - static void set_dev_domain_options(struct pci_dev *pdev) { if (is_vmd(pdev->bus)) @@ -697,7 +660,6 @@ int pcibios_add_device(struct pci_dev *dev) pa_data = data->next; memunmap(data); } - set_dma_domain_ops(dev); set_dev_domain_options(dev); return 0; } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 0/7] Clean up VMD DMA Map Ops
v4 Set: https://lore.kernel.org/linux-pci/20200120110220.gb17...@e121166-lin.cambridge.arm.com/T/#t v3 Set: https://lore.kernel.org/linux-iommu/20200113181742.ga27...@e121166-lin.cambridge.arm.com/T/#t v2 Set: https://lore.kernel.org/linux-iommu/1578580256-3483-1-git-send-email-jonathan.derr...@intel.com/T/#t v1 Set: https://lore.kernel.org/linux-iommu/20200107134125.gd30...@8bytes.org/T/#t VMD currently works with VT-d enabled by pointing DMA and IOMMU actions at the VMD endpoint. The problem with this approach is that the VMD endpoint's device-specific attributes, such as the DMA Mask Bits, are used instead of the child device's attributes. This set cleans up VMD by removing the override that redirects DMA map operations to the VMD endpoint. Instead it introduces a new DMA alias mechanism into the existing DMA alias infrastructure. This new DMA alias mechanism allows an architecture-specific pci_real_dma_dev() function to provide a pointer from a pci_dev to its PCI DMA device, where by default it returns the original pci_dev. In addition, this set removes the sanity check that was added to prevent assigning VMD child devices. By using the DMA alias mechanism, all child devices are assigned the same IOMMU group as the VMD endpoint. This removes the need for restricting VMD child devices from assignment, as the whole group would have to be assigned, requiring unbinding the VMD driver and removing the child device domain. v1 added a pointer in struct pci_dev that pointed to the DMA alias' struct pci_dev and did the necessary DMA alias and IOMMU modifications. v2 introduced a new weak function to reference the 'Direct DMA Alias', and removed the need to add a pointer in struct device or pci_dev. Weak functions are generally frowned upon when it's a single architecture implementation, so I am open to alternatives. v3 referenced the pci_dev rather than the struct device for the PCI 'Direct DMA Alias' (pci_direct_dma_alias()). This revision also allowed pci_for_each_dma_alias() to call any DMA aliases for the Direct DMA alias device, though I don't expect the VMD endpoint to need intra-bus DMA aliases. v4 changes the 'Direct DMA Alias' to instead refer to the 'Real DMA Dev', which either returns the PCI device itself or the PCI DMA device. v5 Fixes a bad call argument to pci_real_dma_dev that would have broken bisection. This revision also changes one of the calls to a one-liner, and assembles the same on my system. Changes from v4: Fix pci_real_dma_dev() call in 4/7. Change other pci_real_dma_dev() call in 4/7 to one-liner. Changes from v3: Uses pci_real_dma_dev() instead of pci_direct_dma_alias() Split IOMMU enabling, IOMMU VMD sanity check and VMD dma_map_ops cleanup into three patches Changes from v2: Uses struct pci_dev for PCI Device 'Direct DMA aliasing' (pci_direct_dma_alias) Allows pci_for_each_dma_alias to iterate over the alias mask of the 'Direct DMA alias' Changes from v1: Removed 1/5 & 2/5 misc fix patches that were merged Uses Christoph's staging/cleanup patches Introduce weak function rather than including pointer in struct device or pci_dev. Based on Bjorn's next: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=next Christoph Hellwig (2): x86/PCI: Add a to_pci_sysdata helper x86/PCI: Remove X86_DEV_DMA_OPS Jon Derrick (5): x86/PCI: Expose VMD's PCI Device in pci_sysdata PCI: Introduce pci_real_dma_dev() iommu/vt-d: Use pci_real_dma_dev() for mapping iommu/vt-d: Remove VMD child device sanity check PCI: vmd: Stop overriding dma_map_ops arch/x86/Kconfig | 3 - arch/x86/include/asm/device.h | 10 --- arch/x86/include/asm/pci.h | 31 - arch/x86/pci/common.c | 48 +++-- drivers/iommu/intel-iommu.c| 11 ++- drivers/pci/controller/Kconfig | 1 - drivers/pci/controller/vmd.c | 152 + drivers/pci/pci.c | 19 +- drivers/pci/search.c | 6 ++ include/linux/pci.h| 1 + 10 files changed, 54 insertions(+), 228 deletions(-) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 5/7] iommu/vt-d: Remove VMD child device sanity check
This removes the sanity check required for VMD child devices. The new pci_real_dma_dev() DMA alias mechanism places them in the same IOMMU group as the VMD endpoint. Assignment of the group would require assigning the VMD endpoint, where unbinding the VMD endpoint removes the child device domain from the hierarchy. Signed-off-by: Jon Derrick --- drivers/iommu/intel-iommu.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 72f26e8..7e2c492f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -774,15 +774,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf if (dev_is_pci(dev)) { struct pci_dev *pf_pdev; - pdev = to_pci_dev(dev); - -#ifdef CONFIG_X86 - /* VMD child devices currently cannot be handled individually */ - if (is_vmd(pdev->bus)) - return NULL; -#endif - - pdev = pci_real_dma_dev(pdev); + pdev = pci_real_dma_dev(to_pci_dev(dev)); /* VFs aren't listed in scope tables; we need to look up * the PF instead to find the IOMMU. */ -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/7] x86/PCI: Expose VMD's PCI Device in pci_sysdata
To be used by Intel-IOMMU code to find the correct domain. CC: Christoph Hellwig Signed-off-by: Jon Derrick --- arch/x86/include/asm/pci.h | 4 ++-- drivers/pci/controller/vmd.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index a4e09db60..6512c54 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -25,7 +25,7 @@ struct pci_sysdata { void*fwnode;/* IRQ domain for MSI assignment */ #endif #if IS_ENABLED(CONFIG_VMD) - bool vmd_domain;/* True if in Intel VMD domain */ + struct pci_dev *vmd_dev; /* VMD Device if in Intel VMD domain */ #endif }; @@ -64,7 +64,7 @@ static inline void *_pci_root_bus_fwnode(struct pci_bus *bus) #if IS_ENABLED(CONFIG_VMD) static inline bool is_vmd(struct pci_bus *bus) { - return to_pci_sysdata(bus)->vmd_domain; + return to_pci_sysdata(bus)->vmd_dev != NULL; } #else #define is_vmd(bus)false diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 2128422..d67ad56 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -679,7 +679,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) .parent = res, }; - sd->vmd_domain = true; + sd->vmd_dev = vmd->dev; sd->domain = vmd_find_free_domain(); if (sd->domain < 0) return sd->domain; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 3/7] PCI: Introduce pci_real_dma_dev()
The current DMA alias implementation requires the aliased device be on the same PCI bus as the requester ID. This introduces an arch-specific mechanism to point to another PCI device when doing mapping and PCI DMA alias search. The default case returns the actual device. CC: Christoph Hellwig Signed-off-by: Jon Derrick --- arch/x86/pci/common.c | 10 ++ drivers/pci/pci.c | 19 ++- drivers/pci/search.c | 6 ++ include/linux/pci.h | 1 + 4 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 1e59df0..fe21a5c 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -736,3 +736,13 @@ int pci_ext_cfg_avail(void) else return 0; } + +#if IS_ENABLED(CONFIG_VMD) +struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) +{ + if (is_vmd(dev->bus)) + return to_pci_sysdata(dev->bus)->vmd_dev; + + return dev; +} +#endif diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 581b177..36d24f2 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6048,7 +6048,9 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2) return (dev1->dma_alias_mask && test_bit(dev2->devfn, dev1->dma_alias_mask)) || (dev2->dma_alias_mask && - test_bit(dev1->devfn, dev2->dma_alias_mask)); + test_bit(dev1->devfn, dev2->dma_alias_mask)) || + pci_real_dma_dev(dev1) == dev2 || + pci_real_dma_dev(dev2) == dev1; } bool pci_device_is_present(struct pci_dev *pdev) @@ -6072,6 +6074,21 @@ void pci_ignore_hotplug(struct pci_dev *dev) } EXPORT_SYMBOL_GPL(pci_ignore_hotplug); +/** + * pci_real_dma_dev - Get PCI DMA device for PCI device + * @dev: the PCI device that may have a PCI DMA alias + * + * Permits the platform to provide architecture-specific functionality to + * devices needing to alias DMA to another PCI device on another PCI bus. If + * the PCI device is on the same bus, it is recommended to use + * pci_add_dma_alias(). This is the default implementation. Architecture + * implementations can override this. + */ +struct pci_dev __weak *pci_real_dma_dev(struct pci_dev *dev) +{ + return dev; +} + resource_size_t __weak pcibios_default_alignment(void) { return 0; diff --git a/drivers/pci/search.c b/drivers/pci/search.c index e4dbdef..2061672 100644 --- a/drivers/pci/search.c +++ b/drivers/pci/search.c @@ -32,6 +32,12 @@ int pci_for_each_dma_alias(struct pci_dev *pdev, struct pci_bus *bus; int ret; + /* +* The device may have an explicit alias requester ID for DMA where the +* requester is on another PCI bus. +*/ + pdev = pci_real_dma_dev(pdev); + ret = fn(pdev, pci_dev_id(pdev), data); if (ret) return ret; diff --git a/include/linux/pci.h b/include/linux/pci.h index 930fab2..3840a54 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1202,6 +1202,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, int pci_select_bars(struct pci_dev *dev, unsigned long flags); bool pci_device_is_present(struct pci_dev *pdev); void pci_ignore_hotplug(struct pci_dev *dev); +struct pci_dev *pci_real_dma_dev(struct pci_dev *dev); int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr, irq_handler_t handler, irq_handler_t thread_fn, void *dev_id, -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 6/7] PCI: vmd: Stop overriding dma_map_ops
Devices on the VMD domain use the VMD endpoint's requester ID and have been relying on the VMD endpoint's DMA operations. The problem with this was that VMD domain devices would use the VMD endpoint's attributes when doing DMA and IOMMU mapping. We can be smarter about this by only using the VMD endpoint when mapping and providing the correct child device's attributes during DMA operations. This patch removes the dma_map_ops redirect. Signed-off-by: Jon Derrick --- drivers/pci/controller/Kconfig | 1 - drivers/pci/controller/vmd.c | 150 - 2 files changed, 151 deletions(-) diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig index 918e283..20bf00f 100644 --- a/drivers/pci/controller/Kconfig +++ b/drivers/pci/controller/Kconfig @@ -239,7 +239,6 @@ config PCIE_TANGO_SMP8759 config VMD depends on PCI_MSI && X86_64 && SRCU - select X86_DEV_DMA_OPS tristate "Intel Volume Management Device Driver" ---help--- Adds support for the Intel Volume Management Device (VMD). VMD is a diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index d67ad56..fe1acb0 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -98,9 +98,6 @@ struct vmd_dev { struct irq_domain *irq_domain; struct pci_bus *bus; u8 busn_start; - - struct dma_map_ops dma_ops; - struct dma_domain dma_domain; }; static inline struct vmd_dev *vmd_from_bus(struct pci_bus *bus) @@ -295,151 +292,6 @@ static void vmd_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) .chip = &vmd_msi_controller, }; -/* - * VMD replaces the requester ID with its own. DMA mappings for devices in a - * VMD domain need to be mapped for the VMD, not the device requiring - * the mapping. - */ -static struct device *to_vmd_dev(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - struct vmd_dev *vmd = vmd_from_bus(pdev->bus); - - return &vmd->dev->dev; -} - -static void *vmd_alloc(struct device *dev, size_t size, dma_addr_t *addr, - gfp_t flag, unsigned long attrs) -{ - return dma_alloc_attrs(to_vmd_dev(dev), size, addr, flag, attrs); -} - -static void vmd_free(struct device *dev, size_t size, void *vaddr, -dma_addr_t addr, unsigned long attrs) -{ - return dma_free_attrs(to_vmd_dev(dev), size, vaddr, addr, attrs); -} - -static int vmd_mmap(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t addr, size_t size, - unsigned long attrs) -{ - return dma_mmap_attrs(to_vmd_dev(dev), vma, cpu_addr, addr, size, - attrs); -} - -static int vmd_get_sgtable(struct device *dev, struct sg_table *sgt, - void *cpu_addr, dma_addr_t addr, size_t size, - unsigned long attrs) -{ - return dma_get_sgtable_attrs(to_vmd_dev(dev), sgt, cpu_addr, addr, size, - attrs); -} - -static dma_addr_t vmd_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - return dma_map_page_attrs(to_vmd_dev(dev), page, offset, size, dir, - attrs); -} - -static void vmd_unmap_page(struct device *dev, dma_addr_t addr, size_t size, - enum dma_data_direction dir, unsigned long attrs) -{ - dma_unmap_page_attrs(to_vmd_dev(dev), addr, size, dir, attrs); -} - -static int vmd_map_sg(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction dir, unsigned long attrs) -{ - return dma_map_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs); -} - -static void vmd_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, -enum dma_data_direction dir, unsigned long attrs) -{ - dma_unmap_sg_attrs(to_vmd_dev(dev), sg, nents, dir, attrs); -} - -static void vmd_sync_single_for_cpu(struct device *dev, dma_addr_t addr, - size_t size, enum dma_data_direction dir) -{ - dma_sync_single_for_cpu(to_vmd_dev(dev), addr, size, dir); -} - -static void vmd_sync_single_for_device(struct device *dev, dma_addr_t addr, - size_t size, enum dma_data_direction dir) -{ - dma_sync_single_for_device(to_vmd_dev(dev), addr, size, dir); -} - -static void vmd_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir) -{ - dma_sync_sg_for_cpu(to_vmd_dev(dev), sg, nents, dir); -} - -static void vmd_sync_sg_for_device(struct device *dev, struct scatterlist *sg, - in
[PATCH v5 4/7] iommu/vt-d: Use pci_real_dma_dev() for mapping
The PCI device may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. This case requires the real DMA device when mapping to IOMMU. Signed-off-by: Jon Derrick --- drivers/iommu/intel-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0c8d81f..72f26e8 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -782,6 +782,8 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf return NULL; #endif + pdev = pci_real_dma_dev(pdev); + /* VFs aren't listed in scope tables; we need to look up * the PF instead to find the IOMMU. */ pf_pdev = pci_physfn(pdev); @@ -2428,6 +2430,9 @@ static struct dmar_domain *find_domain(struct device *dev) dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)) return NULL; + if (dev_is_pci(dev)) + dev = &pci_real_dma_dev(to_pci_dev(dev))->dev; + /* No lock here, assumes no domain exit in normal case */ info = dev->archdata.iommu; if (likely(info)) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 1/7] x86/PCI: Add a to_pci_sysdata helper
From: Christoph Hellwig Various helpers need the pci_sysdata just to dereference a single field in it. Add a little helper that returns the properly typed sysdata pointer to require a little less boilerplate code. Signed-off-by: Christoph Hellwig [jonathan.derrick: to_pci_sysdata const argument] Signed-off-by: Jon Derrick --- arch/x86/include/asm/pci.h | 29 + 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index 90d0731..a4e09db60 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -35,12 +35,15 @@ struct pci_sysdata { #ifdef CONFIG_PCI +static inline struct pci_sysdata *to_pci_sysdata(const struct pci_bus *bus) +{ + return bus->sysdata; +} + #ifdef CONFIG_PCI_DOMAINS static inline int pci_domain_nr(struct pci_bus *bus) { - struct pci_sysdata *sd = bus->sysdata; - - return sd->domain; + return to_pci_sysdata(bus)->domain; } static inline int pci_proc_domain(struct pci_bus *bus) @@ -52,24 +55,20 @@ static inline int pci_proc_domain(struct pci_bus *bus) #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN static inline void *_pci_root_bus_fwnode(struct pci_bus *bus) { - struct pci_sysdata *sd = bus->sysdata; - - return sd->fwnode; + return to_pci_sysdata(bus)->fwnode; } #define pci_root_bus_fwnode_pci_root_bus_fwnode #endif +#if IS_ENABLED(CONFIG_VMD) static inline bool is_vmd(struct pci_bus *bus) { -#if IS_ENABLED(CONFIG_VMD) - struct pci_sysdata *sd = bus->sysdata; - - return sd->vmd_domain; -#else - return false; -#endif + return to_pci_sysdata(bus)->vmd_domain; } +#else +#define is_vmd(bus)false +#endif /* CONFIG_VMD */ /* Can be used to override the logic in pci_scan_bus for skipping already-configured bus numbers - to be used for buggy BIOSes @@ -124,9 +123,7 @@ static inline void early_quirks(void) { } /* Returns the node based on pci bus */ static inline int __pcibus_to_node(const struct pci_bus *bus) { - const struct pci_sysdata *sd = bus->sysdata; - - return sd->node; + return to_pci_sysdata(bus)->node; } static inline const struct cpumask * -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables
On 21/01/2020 5:11 pm, Jordan Crouse wrote: [...] I'm looking at iommu_aux_attach_device() and friends, and it appears pretty achievable to hook that up in a workable manner, even if it's just routed straight through to the impl to only work within qcom-specific parameters to begin with. I figure the first aux_attach_dev sanity-checks that the main domain is using TTBR1 with a compatible split, sets TTBR0 and updates the merged TCR value at that point. For subsequent calls it shouldn't need to do much more than sanity-check that a new aux domain has the same parameters as the existing one(s) (and again, such checks could potentially even start out as just "this is OK by construction" comments). I guess we'd probably want a count of the number of 'live' aux domains so we can simply disable TTBR0 on the final aux_detach_dev without having to keep detailed track of whatever the GPU has actually context switched in the hardware. Can you see any holes in that idea? Let me repeat this back just to be sure we're on the same page. When the quirk is enabled on the primary domain, we'll set up TTBR1 and leave TTBR0 disabled. Then, when the first aux domain is attached we will set up that io_ptgable to enable TTBR0 and then let the GPU do what the GPU does until the last aux is detached and we can switch off TTBR0 again. I like this. I'll have to do a bit more exploration because the original aux design assumed that we didn't need to touch the hardware and I'm not sure if there are any resource contention issues between the primary domain and the aux domain. Luckily, these should be solvable if they exist (and the original design didn't take into account the TLB flush problem so this was likely something we had to do anyway). Yeah, sounds like you've got it (somehow I'd completely forgotten that you'd already prototyped the aux domain part, and I only re-read the cover letter after sending that review...). TBH it's not massively different, just being a bit more honest about the intermediate hardware state. As long as we can rely on all aux domains being equivalent and the GPU never writing nonsense to TTBR0, then all arm-smmu really wants to care about is whether there's *something* live or not at any given time, so attach (with quirk) does: TTBR1 = primary_domain->ttbr TCR = primary_domain->tcr | EPD0 then attach_aux comes along and adds: TTBR0 = aux_domain->ttbr TCR = primary_doman->tcr | aux_domain->tcr such that arm-smmu can be happy that TTBR0 is always pointing at *some* valid pagetable from that point on regardless of what subsequently happens underneath, and nobody need touch TCR until the party's completely over. I haven't thought it through in detail, but it also feels like between aux_attach_dev and/or the TTBR1 quirk in attach_dev there ought to be enough information to influence the context bank allocation or shuffle any existing domains such that you can ensure that the right thing ends up in magic context 0 when it needs to be. That could be a pretty neat and robust way to finally put that to bed. I'll try to wrap my brain around this as well. Seems like we could do a magic swizzle of the SID mappings but I'm not sure how we could safely pull that off on an existing domain. Maybe I'm overthinking it. What I'm imagining isn't all that far from how we do normal domain attach, except instead of setting up the newly-allocated context for a new domain you simply clone the existing context into it, and instead of having a given device's set of Stream IDs to retarget you'd just scan though the S2CRs checking cbndx and rewriting as appropriate. Then finally rewrite domain->cfg.cbndx and the old context is all yours. I'll spin up a new copy of the TTBR1 quirk patch and revive the aux domain stuff and then we can go from there. Sounds good, thanks! Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()
On Tue, Jan 21, 2020 at 1:52 AM Robin Murphy wrote: > > On 18/12/2019 4:39 am, Cong Wang wrote: > > If the magazine is empty, iova_magazine_free_pfns() should > > be a nop, however it misses the case of mag->size==0. So we > > should just call iova_magazine_empty(). > > > > This should reduce the contention on iovad->iova_rbtree_lock > > a little bit, not much at all. > > Have you measured that in any way? AFAICS the only time this can get > called with a non-full magazine is in the CPU hotplug callback, where > the impact of taking the rbtree lock and immediately releasing it seems > unlikely to be significant on top of everything else involved in that > operation. This patchset is only tested as a whole, it is not easy to deploy each to production and test it separately. Is there anything wrong to optimize a CPU hotplug path? :) And, it is called in alloc_iova_fast() too when, for example, over-cached. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v3 1/3] iommu: avoid unnecessary magazine allocations
On Tue, Jan 21, 2020 at 3:11 AM Robin Murphy wrote: > > On 18/12/2019 4:39 am, Cong Wang wrote: > > The IOVA cache algorithm implemented in IOMMU code does not > > exactly match the original algorithm described in the paper > > "Magazines and Vmem: Extending the Slab Allocator to Many > > CPUs and Arbitrary Resources". > > > > Particularly, it doesn't need to free the loaded empty magazine > > when trying to put it back to global depot. To make it work, we > > have to pre-allocate magazines in the depot and only recycle them > > when all of them are full. > > > > Before this patch, rcache->depot[] contains either full or > > freed entries, after this patch, it contains either full or > > empty (but allocated) entries. > > How much additional memory overhead does this impose (particularly on > systems that may have many domains mostly used for large, long-term > mappings)? I'm wary that trying to micro-optimise for the "churn network > packets as fast as possible" case may penalise every other case, > potentially quite badly. Lower-end embedded systems are using IOMMUs in > front of their GPUs, video codecs, etc. precisely because they *don't* > have much memory to spare (and thus need to scrape together large > buffers out of whatever pages they can find). The calculation is not complicated: 32 * 6 * 129 * 8 = 198144 bytes, which is roughly 192K, per domain. > > But on the other hand, if we were to go down this route, then why is > there any dynamic allocation/freeing left at all? Once both the depot > and the rcaches are preallocated, then AFAICS it would make more sense > to rework the overflow case in __iova_rcache_insert() to just free the > IOVAs and swap the empty mag around rather than destroying and > recreating it entirely. It's due to the algorithm requires a swap(), which can't be done with statically allocated magzine. I had the same thought initially but gave it up quickly when realized this. If you are suggesting to change the algorithm, it is not a goal of this patchset. I do have plan to search for a better algorithm as the IOMMU performance still sucks (comparing to no IOMMU) after this patchset, but once again, I do not want to change it in this patchset. (My ultimate goal is to find a spinlock-free algorithm, otherwise there is no way to make it close to no-IOMMU performance.) > > Perhaps there's a reasonable compromise wherein we don't preallocate, > but still 'free' empty magazines back to the depot, such that busy > domains will quickly reach a steady-state. In fact, having now dug up > the paper at this point of writing this reply, that appears to be what > fig. 3.1b describes anyway - I don't see any mention of preallocating > the depot. That paper missed a lot of things, it doesn't even recommend a size of a depot or percpu cache. For implementation, we still have to think about those details, including whether to preallocate memory. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables
On Tue, Jan 21, 2020 at 02:36:19PM +, Robin Murphy wrote: > On 16/12/2019 4:37 pm, Jordan Crouse wrote: > >Add support to enable split pagetables (TTBR1) if the supporting driver > >requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver > >will set up the TTBR0 and TTBR1 regions and program the default domain > >pagetable on TTBR1. > > > >After attaching the device, the value of he domain attribute can > >be queried to see if the split pagetables were successfully programmed. > >Furthermore the domain geometry will be updated so that the caller can > >determine the active region for the pagetable that was programmed. > > > >Signed-off-by: Jordan Crouse > >--- > > > > drivers/iommu/arm-smmu.c | 40 +++- > > drivers/iommu/arm-smmu.h | 45 +++-- > > 2 files changed, 74 insertions(+), 11 deletions(-) > > > >diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > >index c106406..7b59116 100644 > >--- a/drivers/iommu/arm-smmu.c > >+++ b/drivers/iommu/arm-smmu.c > >@@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct > >arm_smmu_domain *smmu_domain, > > cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; > > cb->ttbr[1] = 0; > > } else { > >-cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; > >-cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid); > >-cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); > >+if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { > >+cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid); > >+cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; > >+cb->ttbr[1] |= > >+FIELD_PREP(TTBRn_ASID, cfg->asid); > >+} else { > >+cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; > >+cb->ttbr[0] |= > >+FIELD_PREP(TTBRn_ASID, cfg->asid); > >+cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); > >+} > > } > > } else { > > cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; > >@@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct > >iommu_domain *domain, > > enum io_pgtable_fmt fmt; > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > >+u32 quirks = 0; > > mutex_lock(&smmu_domain->init_mutex); > > if (smmu_domain->smmu) > >@@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct > >iommu_domain *domain, > > oas = smmu->ipa_size; > > if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) { > > fmt = ARM_64_LPAE_S1; > >+if (smmu_domain->split_pagetables) > >+quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; > > } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) { > > fmt = ARM_32_LPAE_S1; > > ias = min(ias, 32UL); > >@@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct > >iommu_domain *domain, > > .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK, > > .tlb= smmu_domain->flush_ops, > > .iommu_dev = smmu->dev, > >+.quirks = quirks, > > }; > > if (smmu_domain->non_strict) > >@@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct > >iommu_domain *domain, > > /* Update the domain's page sizes to reflect the page table format */ > > domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; > >-domain->geometry.aperture_end = (1UL << ias) - 1; > >-domain->geometry.force_aperture = true; > >+ > >+if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { > >+domain->geometry.aperture_start = ~((1ULL << ias) - 1); > > AKA "~0UL << ias", if I'm not mistaken ;) > > >+domain->geometry.aperture_end = ~0UL; > >+} else { > >+domain->geometry.aperture_end = (1UL << ias) - 1; > >+domain->geometry.force_aperture = true; > >+smmu_domain->split_pagetables = false; > >+} > > /* Initialise the context bank with our page table cfg */ > > arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg); > >@@ -1484,6 +1503,9 @@ static int arm_smmu_domain_get_attr(struct > >iommu_domain *domain, > > case DOMAIN_ATTR_NESTING: > > *(int *)data = (smmu_domain->stage == > > ARM_SMMU_DOMAIN_NESTED); > > return 0; > >+case DOMAIN_ATTR_SPLIT_TABLES: > >+*(int *)data = smmu_domain->split_pagetables; > >+return 0; > > default: > > return -ENODEV; > > }
Re: [PATCH] iommu: amd: Fix IOMMU perf counter clobbering during init
On 1/20/20 7:10 PM, Suravee Suthikulpanit wrote: On 1/17/2020 5:08 PM, Joerg Roedel wrote: Adding Suravee, who wrote the IOMMU Perf Counter code. On Tue, Jan 14, 2020 at 08:12:20AM -0700, Shuah Khan wrote: init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan --- drivers/iommu/amd_iommu_init.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) Suravee, can you please review this patch? This looks ok. Does this fix certain issues? Or is this just for sanity. I didn't notice any problems. Counters aren't writable on my system. However, it certainly looks like a bog since registers aren't restored like in other places in this file where such checks are done on other registers. I see 2 banks and 4 counters on my system. Is it sufficient to check the first bank and first counter? In other words, if the first one isn't writable, are all counters non-writable? Should we read the config first and then, try to see if any of the counters are writable? I have a patch that does that, I can send it out for review. Reviewed-by: Suravee Suthikulpanit Thanks for the review. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables
Oh, one more thing... On 16/12/2019 4:37 pm, Jordan Crouse wrote: [...] @@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, enum io_pgtable_fmt fmt; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + u32 quirks = 0; mutex_lock(&smmu_domain->init_mutex); if (smmu_domain->smmu) @@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, oas = smmu->ipa_size; if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) { fmt = ARM_64_LPAE_S1; + if (smmu_domain->split_pagetables) + quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; To avoid me forgetting and questioning it again in future, I'd recommend sticking a comment somewhere near here that we don't reduce cfg->ias in this case because we're currently assuming SEP_UPSTREAM. Robin. } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) { fmt = ARM_32_LPAE_S1; ias = min(ias, 32UL); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables
On 16/12/2019 4:37 pm, Jordan Crouse wrote: Add support to enable split pagetables (TTBR1) if the supporting driver requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver will set up the TTBR0 and TTBR1 regions and program the default domain pagetable on TTBR1. After attaching the device, the value of he domain attribute can be queried to see if the split pagetables were successfully programmed. Furthermore the domain geometry will be updated so that the caller can determine the active region for the pagetable that was programmed. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 40 +++- drivers/iommu/arm-smmu.h | 45 +++-- 2 files changed, 74 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c106406..7b59116 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; cb->ttbr[1] = 0; } else { - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; - cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid); - cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); + if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid); + cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[1] |= + FIELD_PREP(TTBRn_ASID, cfg->asid); + } else { + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[0] |= + FIELD_PREP(TTBRn_ASID, cfg->asid); + cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); + } } } else { cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; @@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, enum io_pgtable_fmt fmt; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + u32 quirks = 0; mutex_lock(&smmu_domain->init_mutex); if (smmu_domain->smmu) @@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, oas = smmu->ipa_size; if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) { fmt = ARM_64_LPAE_S1; + if (smmu_domain->split_pagetables) + quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) { fmt = ARM_32_LPAE_S1; ias = min(ias, 32UL); @@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK, .tlb= smmu_domain->flush_ops, .iommu_dev = smmu->dev, + .quirks = quirks, }; if (smmu_domain->non_strict) @@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, /* Update the domain's page sizes to reflect the page table format */ domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << ias) - 1; - domain->geometry.force_aperture = true; + + if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + domain->geometry.aperture_start = ~((1ULL << ias) - 1); AKA "~0UL << ias", if I'm not mistaken ;) + domain->geometry.aperture_end = ~0UL; + } else { + domain->geometry.aperture_end = (1UL << ias) - 1; + domain->geometry.force_aperture = true; + smmu_domain->split_pagetables = false; + } /* Initialise the context bank with our page table cfg */ arm_smmu_init_context_bank(smmu_domain, &pgtbl_cfg); @@ -1484,6 +1503,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_SPLIT_TABLES: + *(int *)data = smmu_domain->split_pagetables; + return 0; default: return -ENODEV; } @@ -1524,6 +1546,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, else smmu_domain->stage = ARM_S
Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
[+cc linux-pci, thread at https://lore.kernel.org/r/20200101052648.14295-1-baolu...@linux.intel.com] On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote: > The new parameter takes a list of devices separated by a semicolon. > Each device specified will have its iommu_passthrough bit in struct > device set. This is very similar to the existing 'disable_acs_redir' > parameter. Almost all of this patchset is in drivers/iommu. Should the parameter be "iommu ..." instead of "pci=iommu_passthrough=..."? There is already an "iommu.passthrough=" argument. Would this fit better there? Since the iommu_passthrough bit is generic, it seems like you anticipate similar situations for non-PCI devices. > Signed-off-by: Lu Baolu > --- > .../admin-guide/kernel-parameters.txt | 5 +++ > drivers/pci/pci.c | 34 +++ > drivers/pci/pci.h | 1 + > drivers/pci/probe.c | 2 ++ > 4 files changed, 42 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index ade4e6ec23e0..d3edc2cb6696 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3583,6 +3583,11 @@ > may put more devices in an IOMMU group. > force_floating [S390] Force usage of floating interrupts. > nomio [S390] Do not use MIO instructions. > + iommu_passthrough=[; ...] > + Specify one or more PCI devices (in the format > + specified above) separated by semicolons. > + Each device specified will bypass IOMMU DMA > + translation. > > pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State > Power > Management. > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 90dbd7c70371..05bf3f4acc36 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus) > } > EXPORT_SYMBOL(pci_fixup_cardbus); > > +static const char *iommu_passthrough_param; > +bool pci_iommu_passthrough_match(struct pci_dev *dev) > +{ > + int ret = 0; > + const char *p = iommu_passthrough_param; > + > + if (!p) > + return false; > + > + while (*p) { > + ret = pci_dev_str_match(dev, p, &p); > + if (ret < 0) { > + pr_info_once("PCI: Can't parse iommu_passthrough > parameter: %s\n", > + iommu_passthrough_param); > + > + break; > + } else if (ret == 1) { > + pci_info(dev, "PCI: IOMMU passthrough\n"); > + return true; > + } > + > + if (*p != ';' && *p != ',') { > + /* End of param or invalid format */ > + break; > + } > + p++; > + } > + > + return false; > +} > + > static int __init pci_setup(char *str) > { > while (str) { > @@ -6462,6 +6493,8 @@ static int __init pci_setup(char *str) > pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS); > } else if (!strncmp(str, "disable_acs_redir=", 18)) { > disable_acs_redir_param = str + 18; > + } else if (!strncmp(str, "iommu_passthrough=", 18)) { > + iommu_passthrough_param = str + 18; > } else { > pr_err("PCI: Unknown option `%s'\n", str); > } > @@ -6486,6 +6519,7 @@ static int __init pci_realloc_setup_params(void) > resource_alignment_param = kstrdup(resource_alignment_param, > GFP_KERNEL); > disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL); > + iommu_passthrough_param = kstrdup(iommu_passthrough_param, GFP_KERNEL); > > return 0; > } > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index a0a53bd05a0b..95f6af06aba6 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -288,6 +288,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev > *dev); > void pci_disable_bridge_window(struct pci_dev *dev); > struct pci_bus *pci_bus_get(struct pci_bus *bus); > void pci_bus_put(struct pci_bus *bus); > +bool pci_iommu_passthrough_match(struct pci_dev *dev); > > /* PCIe link information */ > #define PCIE_SPEED2STR(speed) \ > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 512cb4312ddd..4c571ee75621 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2404,6 +2404,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus > *bus) > > dev->state_saved = false; > > + dev->dev.iommu
Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
On 19/01/2020 6:29 am, Lu Baolu wrote: Hi Joerg, On 1/17/20 6:21 PM, Joerg Roedel wrote: On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote: This splits iommu group allocation from adding devices. This makes it possible to determine the default domain type for each group as all devices belonging to the group have been determined. I think its better to keep group allocation as it is and just defer default domain allocation after each device is in its group. But take I tried defering default domain allocation, but it seems not possible. The call path of adding devices into their groups: iommu_probe_device -> ops->add_device(dev) -> (iommu vendor driver) iommu_group_get_for_dev(dev) After doing this, the vendor driver will get the default domain and apply dma_ops according to the domain type. If we defer the domain allocation, they will get a NULL default domain and cause panic in the vendor driver. Any suggestions? https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0...@linux.intel.com/ Haven't we been here before? ;) Since we can't (safely or reasonably) change a group's default domain after ops->add_device() has returned, and in general it gets impractical to evaluate "all device in a group" once you look beyond &pci_bus_type (or consider hotplug as mentioned), then AFAICS there's no reasonable way to get away from the default domain type being defined by the first device to attach. But in practice it's hardly a problem anyway - if every device in a given group requests the same domain type then it doesn't matter which comes first, and if they don't then we ultimately end up with an impossible set of constraints, so are doomed to do the 'wrong' thing regardless. Thus unless anyone wants to start redefining the whole group concept to separate the notions of ID aliasing and peer-to-peer isolation (which still wouldn't necessarily help), I think this user override effectively boils down to just another flavour of iommu_request_*_for_dev(), and as such comes right back to the "just pass the bloody device to ops->domain_alloc() and resolve everything up-front" argument. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v3 1/3] iommu: avoid unnecessary magazine allocations
On 18/12/2019 4:39 am, Cong Wang wrote: The IOVA cache algorithm implemented in IOMMU code does not exactly match the original algorithm described in the paper "Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources". Particularly, it doesn't need to free the loaded empty magazine when trying to put it back to global depot. To make it work, we have to pre-allocate magazines in the depot and only recycle them when all of them are full. Before this patch, rcache->depot[] contains either full or freed entries, after this patch, it contains either full or empty (but allocated) entries. How much additional memory overhead does this impose (particularly on systems that may have many domains mostly used for large, long-term mappings)? I'm wary that trying to micro-optimise for the "churn network packets as fast as possible" case may penalise every other case, potentially quite badly. Lower-end embedded systems are using IOMMUs in front of their GPUs, video codecs, etc. precisely because they *don't* have much memory to spare (and thus need to scrape together large buffers out of whatever pages they can find). But on the other hand, if we were to go down this route, then why is there any dynamic allocation/freeing left at all? Once both the depot and the rcaches are preallocated, then AFAICS it would make more sense to rework the overflow case in __iova_rcache_insert() to just free the IOVAs and swap the empty mag around rather than destroying and recreating it entirely. Perhaps there's a reasonable compromise wherein we don't preallocate, but still 'free' empty magazines back to the depot, such that busy domains will quickly reach a steady-state. In fact, having now dug up the paper at this point of writing this reply, that appears to be what fig. 3.1b describes anyway - I don't see any mention of preallocating the depot. Robin. Together with a few other changes to make it exactly match the pseudo code in the paper. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 45 +++- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 41c605b0058f..cb473ddce4cf 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -862,12 +862,16 @@ static void init_iova_rcaches(struct iova_domain *iovad) struct iova_cpu_rcache *cpu_rcache; struct iova_rcache *rcache; unsigned int cpu; - int i; + int i, j; for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { rcache = &iovad->rcaches[i]; spin_lock_init(&rcache->lock); rcache->depot_size = 0; + for (j = 0; j < MAX_GLOBAL_MAGS; ++j) { + rcache->depot[j] = iova_magazine_alloc(GFP_KERNEL); + WARN_ON(!rcache->depot[j]); + } rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); if (WARN_ON(!rcache->cpu_rcaches)) continue; @@ -900,24 +904,30 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, if (!iova_magazine_full(cpu_rcache->loaded)) { can_insert = true; - } else if (!iova_magazine_full(cpu_rcache->prev)) { + } else if (iova_magazine_empty(cpu_rcache->prev)) { swap(cpu_rcache->prev, cpu_rcache->loaded); can_insert = true; } else { - struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); + spin_lock(&rcache->lock); + if (rcache->depot_size < MAX_GLOBAL_MAGS) { + swap(rcache->depot[rcache->depot_size], cpu_rcache->prev); + swap(cpu_rcache->prev, cpu_rcache->loaded); + rcache->depot_size++; + can_insert = true; + } else { + mag_to_free = cpu_rcache->loaded; + } + spin_unlock(&rcache->lock); + + if (mag_to_free) { + struct iova_magazine *new_mag = iova_magazine_alloc(GFP_ATOMIC); - if (new_mag) { - spin_lock(&rcache->lock); - if (rcache->depot_size < MAX_GLOBAL_MAGS) { - rcache->depot[rcache->depot_size++] = - cpu_rcache->loaded; + if (new_mag) { + cpu_rcache->loaded = new_mag; + can_insert = true; } else { - mag_to_free = cpu_rcache->loaded; + mag_to_free = NULL; } - spin_unlock(&rcache->lock); - - cpu_rcache->loaded = new_mag; - can_insert = true;
Re: [RFC PATCH 0/4] iommu: Per-group default domain type
On 21/01/2020 00:43, Lu Baolu wrote: An IOMMU group represents the smallest set of devices that are considered to be isolated. All devices belonging to an IOMMU group share a default domain for DMA APIs. There are two types of default domain: IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the latter means IOMMU by-pass. Currently, the default domain type for the IOMMU groups is determined globally. All IOMMU groups use a single default domain type. The global default domain type can be adjusted by kernel build configuration or kernel parameters. More and more users are looking forward to a fine grained default domain type. For example, with the global default domain type set to translation, the OEM verndors or end users might want some trusted and fast-speed devices to bypass IOMMU for performance gains. On the other hand, with global default domain type set to by-pass, some devices with limited system memory addressing capability might want IOMMU translation to remove the bounce buffer overhead. Hi Lu Baolu, Do you think that it would be a more common usecase to want kernel-managed devices to be passthrough for performance reasons and some select devices to be in DMA domain, like those with limited address cap or whose drivers request huge amounts of memory? I just think it would be more manageable to set kernel commandline parameters for this, i.e. those select few which want DMA domain. Hi Baolu, It's just two sides of a coin. Currently, iommu subsystem make DMA domain by default, that's the reason why I selected to let user set which devices are willing to use identity domains. OK, understood. There was an alternate solution here which would allow per-group type to be updated via sysfs: https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prak...@intel.com/ Any idea what happened to that? Cheers, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v3 3/3] iommu: avoid taking iova_rbtree_lock twice
On 18/12/2019 4:39 am, Cong Wang wrote: Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into one critical section by calling the unlock versions instead. Makes sense to me. Reviewed-by: Robin Murphy Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 184d4c0e20b5..f46f8f794678 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -390,10 +390,14 @@ EXPORT_SYMBOL_GPL(__free_iova); void free_iova(struct iova_domain *iovad, unsigned long pfn) { - struct iova *iova = find_iova(iovad, pfn); + unsigned long flags; + struct iova *iova; + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + iova = private_find_iova(iovad, pfn); if (iova) - __free_iova(iovad, iova); + private_free_iova(iovad, iova); + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); } EXPORT_SYMBOL_GPL(free_iova); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()
On 18/12/2019 4:39 am, Cong Wang wrote: If the magazine is empty, iova_magazine_free_pfns() should be a nop, however it misses the case of mag->size==0. So we should just call iova_magazine_empty(). This should reduce the contention on iovad->iova_rbtree_lock a little bit, not much at all. Have you measured that in any way? AFAICS the only time this can get called with a non-full magazine is in the CPU hotplug callback, where the impact of taking the rbtree lock and immediately releasing it seems unlikely to be significant on top of everything else involved in that operation. Robin. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index cb473ddce4cf..184d4c0e20b5 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -797,13 +797,23 @@ static void iova_magazine_free(struct iova_magazine *mag) kfree(mag); } +static bool iova_magazine_full(struct iova_magazine *mag) +{ + return (mag && mag->size == IOVA_MAG_SIZE); +} + +static bool iova_magazine_empty(struct iova_magazine *mag) +{ + return (!mag || mag->size == 0); +} + static void iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) { unsigned long flags; int i; - if (!mag) + if (iova_magazine_empty(mag)) return; spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); @@ -820,16 +830,6 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) mag->size = 0; } -static bool iova_magazine_full(struct iova_magazine *mag) -{ - return (mag && mag->size == IOVA_MAG_SIZE); -} - -static bool iova_magazine_empty(struct iova_magazine *mag) -{ - return (!mag || mag->size == 0); -} - static unsigned long iova_magazine_pop(struct iova_magazine *mag, unsigned long limit_pfn) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu