Re: [PATCH v2 01/11] iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth
Hi Bjorn, On Mon, Jul 27, 2015 at 05:54:53PM -0500, Bjorn Helgaas wrote: Hmm, this is a place where the relaxed error handling in pci_enable_ats() can get problematic. By relaxed error handling, do you mean the fact that in v4.1, pci_enable_ats() has a BUG_ON(is_enabled), while now it merely returns -EINVAL? (BTW, I did change it to add a WARN_ON and return -EBUSY as you suggested.) Okay, great. If ATS is (by accident or a bug elsewhere) already enabled an the function returns -EINVAL, the IOMMU driver considers ATS disabled and doesn't flush the IO/TLBs of the device. This can cause data corruption later on, so we should make sure that info-ats.enabled is consistent with pdev-ats_enabled. I'm not quite sure I understand this. In the patch above, we only set info-ats.enabled = 1 if pci_enable_ats() has succeeded. The amd_iommu.c code is similar. Are you concerned about the case where future code enables ATS before intel-iommu, the pci_enable_ats() in intel-iommu fails, intel-iommu believes ATS is disabled, intel-iommu calls iommu_flush_dev_iotlb(), but it doesn't flush the IOTLB? I guess I could make intel-iommu handle -EBUSY differently from -EINVAL. Would that help? It seems sort of clumsy, but ...? I was concerned that it was harder now to spot bugs in ATS enabling/disabling, when pci_enable_ats just returns -EINVAL when ATS is already enabled. But with the WARN_ON now we will notice these bugs early again, thanks for adding it. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library
On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote: From: Harish Chegondi harish.chego...@intel.com This patch converts iova.c into a library, moving it from drivers/iommu/ to lib/, and exports its virtual address allocation and management functions so that other modules can reuse them. Cc: Joerg Roedel j...@8bytes.org Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com Reviewed-by: Sudeep Dutt sudeep.d...@intel.com Signed-off-by: Harish Chegondi harish.chego...@intel.com Where is this going to be used outside of the IOMMU world? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use
Hi Robin, On Mon, Jul 27, 2015 at 07:18:08PM +0100, Robin Murphy wrote: Currently, users of the LPAE page table code are (ab)using dma_map_page() as a means to flush page table updates for non-coherent IOMMUs. Since from the CPU's point of view, creating IOMMU page tables *is* passing DMA buffers to a device (the IOMMU's page table walker), there's little reason not to use the DMA API correctly. Allow drivers to opt into appropriate DMA operations for page table allocation and updates by providing the relevant device, and make the flush_pgtable() callback optional in case those DMA API operations are sufficient. The expectation is that an LPAE IOMMU should have a full view of system memory, so use streaming mappings to avoid unnecessary pressure on ZONE_DMA, and treat any DMA translation as a warning sign. Signed-off-by: Robin Murphy robin.mur...@arm.com --- Hi all, Since Russell fixing Tegra[1] reminded me, I dug this out from, er, rather a long time ago[2] and tidied it up. I've tested the SMMUv2 version with the MMU-401s on Juno (both coherent and non-coherent) with no visible regressions; I have the same hope for the SMMUv3 and IPMMU changes since they should be semantically identical. At worst the Renesas driver might need a larger DMA mask setting as per f1d84548694f, but given that there shouldn't be any highmem involved I'd think it should be OK as-is. Robin. [1]:http://article.gmane.org/gmane.linux.ports.tegra/23150 [2]:http://article.gmane.org/gmane.linux.kernel.iommu/8972 drivers/iommu/io-pgtable-arm.c | 107 +++-- drivers/iommu/io-pgtable.h | 2 + 2 files changed, 84 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 4e46021..b93a60e 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte; static bool selftest_running = false; +static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages) +{ + return phys_to_dma(dev, virt_to_phys(pages)); +} + +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, + struct io_pgtable_cfg *cfg, void *cookie) +{ + void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO); + struct device *dev = cfg-iommu_dev; + dma_addr_t dma; + + if (!pages) + return NULL; Missing newline here. + if (dev) { + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); + if (dma_mapping_error(dev, dma)) + goto out_free; + /* + * We depend on the IOMMU being able to work with any physical + * address directly, so if the DMA layer suggests it can't by + * giving us back some translation, that bodes very badly... + */ + if (WARN(dma != __arm_lpae_dma(dev, pages), + Cannot accommodate DMA translation for IOMMU page tables\n)) Now that we have a struct device for the iommu, we could use dev_err to make this diagnostic more useful. + goto out_unmap; + } Missing newline again... + if (cfg-tlb-flush_pgtable) Why would you have both a dev and a flush callback? In which cases is the DMA API insufficient? + cfg-tlb-flush_pgtable(pages, size, cookie); ... and here (yeah, pedantry, but consistency keeps this file easier to read). + return pages; + +out_unmap: + dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); +out_free: + free_pages_exact(pages, size); + return NULL; +} + +static void __arm_lpae_free_pages(void *pages, size_t size, + struct io_pgtable_cfg *cfg) +{ + struct device *dev = cfg-iommu_dev; + + if (dev) + dma_unmap_single(dev, __arm_lpae_dma(dev, pages), + size, DMA_TO_DEVICE); + free_pages_exact(pages, size); +} + +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, +struct io_pgtable_cfg *cfg, void *cookie) +{ + struct device *dev = cfg-iommu_dev; + + *ptep = pte; + + if (dev) + dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep), +sizeof(pte), DMA_TO_DEVICE); + if (cfg-tlb-flush_pgtable) + cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie); Could we kill the flush_pgtable callback completely and just stick in a dma_wmb() here? Ideally, we've have something like dma_store_release, which we could use to set the parent page table entry, but that's left as a future exercise ;) diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 10e32f6..39fe864 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -41,6 +41,7 @@ struct iommu_gather_ops { * @ias:
Re: [PATCH 2/5] iommu/arm-smmu: Sort out coherency
Hi Robin, Cheers for doing this. Just a few comments (mainly in relation to being consistent with SMMUv3). On Mon, Jul 27, 2015 at 07:18:09PM +0100, Robin Murphy wrote: Currently we detect the whether the SMMU has coherent page table walk we detect the whether? capability from the IDR0.CTTW field, and base our cache maintenance decisions on that. In preparation for fixing the bogus DMA API usage, however, we need to ensure that the DMA API agrees about this, which necessitates deferring to the dma-coherent property in the device tree for the final say. As an added bonus, since systems exist where an external CTTW signal has been tied off incorrectly at integration, allowing DT to override it offers a neat workaround for coherency issues with such SMMUs. Signed-off-by: Robin Murphy robin.mur...@arm.com --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 drivers/iommu/arm-smmu.c | 17 ++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 0676050..04724fc 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -43,6 +43,10 @@ conditions. ** System MMU optional properties: +- dma-coherent : Presence or absence should correctly reflect whether + the SMMU translation table walk interface is + cache-coherent with the CPU. + Please copy the text from: Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt# and remove the bit about stream table accesses. - calxeda,smmu-secure-config-access : Enable proper handling of buggy implementations that always use secure access to SMMU configuration registers. In this case non-secure diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 4cd0c29..9a59bef 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1532,6 +1532,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) unsigned long size; void __iomem *gr0_base = ARM_SMMU_GR0(smmu); u32 id; + bool cttw_dt, cttw_reg; dev_notice(smmu-dev, probing hardware configuration...\n); dev_notice(smmu-dev, SMMUv%d with:\n, smmu-version); @@ -1571,10 +1572,20 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) dev_notice(smmu-dev, \taddress translation ops\n); } - if (id ID0_CTTW) { + /* + * In order for DMA API calls to work properly, we must defer to what + * the DT says about coherency, regardless of what the hardware claims. + * Fortunately, this also opens up a workaround for systems where the + * ID register value has ended up configured incorrectly. + */ + cttw_dt = of_property_read_bool(smmu-dev-of_node, dma-coherent); Use of_dma_is_coherent(smmu-dev-of_node) + if (cttw_dt) smmu-features |= ARM_SMMU_FEAT_COHERENT_WALK; - dev_notice(smmu-dev, \tcoherent table walk\n); - } + cttw_reg = !!(id ID0_CTTW); + if (cttw_dt || cttw_reg) + dev_notice(smmu-dev, \t%scoherent table walk%s\n, +cttw_dt ? : no , +(cttw_dt == cttw_reg) ? : (overridden by DT)); Similarly here; I think we should just follow the message printed by the SMMUv3 driver but s/IDR0.COHACC/ID0.CTTW/. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.
On Tue, Jul 28, 2015 at 02:37:43PM +0100, Yong Wu wrote: On Tue, 2015-07-28 at 12:00 +0100, Will Deacon wrote: On Tue, Jul 28, 2015 at 06:08:14AM +0100, Yong Wu wrote: On Mon, 2015-07-27 at 15:11 +0100, Will Deacon wrote: On Mon, Jul 27, 2015 at 03:05:38PM +0100, Robin Murphy wrote: On 27/07/15 05:21, Yong Wu wrote: + } else {/* page or largepage */ + if (quirk IO_PGTABLE_QUIRK_SHORT_MTK) { + if (large) { /* special Bit */ This definitely needs a better comment! What exactly are you doing here and what is that quirk all about? I use this quirk is for MTK Special Bit as we don't have the XN bit in pagetable. I'm still not really clear about what this is. There is some difference between the standard spec and MTK HW, Our hw don't implement some bits, like XN and AP. So I add a quirk for MTK special. When you say it doesn't implement these bits, do you mean that having them set will lead to Bad Things happening in the hardware, or that it will simply ignore them and not enforce any of the protections they imply? The former case would definitely want clearly documenting somewhere, whereas for the latter case I'm not sure it's even worth the complication of having a quirk - if the value doesn't matter there seems little point in doing a special dance just for the sake of semantic correctness of the in-memory PTEs, in my opinion. Agreed. We should only use quirks if the current (architecturally compliant) code causes real issues with the hardware. Then the quirk can be used to either avoid the problematic routines or to take extra steps to make things work as the architecture intended. I've asked how this IOMMU differs from the architecture on a number of occasions, but I'm still yet to receive a response other than it's special. After check further with DE, Our pagetable is refer to ARM-v7's short-descriptor which is a little different from ARM-v8. like bit0(PXN) in section and supersection, I didn't read ARM-v7 spec before, so I add a MTK quirk to disable PXN bit in section and supersection.(if the PXN bit is wrote in ARM-v7 spec, HW will page fault.) I've been reviewing this using the ARMv7 ARM (Rev.C of DDI0406C) the whole time. PXN is there as an optional field in non-LPAE implementations. That's fine and doesn't require any quirks. Thanks for your confirm. Then I delete all the PXN bit in here? Take a example, #define ARM_SHORT_PGD_SECTION_XN (BIT(0) | BIT(4)) I will change it to BIT(4). Yes. Then the PXN bit can be added later as a quirk when we have an implementation that supports it. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library
On Tue, 2015-07-28 at 11:41 +0100, Robin Murphy wrote: On 28/07/15 11:03, Joerg Roedel wrote: On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote: From: Harish Chegondi harish.chego...@intel.com This patch converts iova.c into a library, moving it from drivers/iommu/ to lib/, and exports its virtual address allocation and management functions so that other modules can reuse them. Cc: Joerg Roedel j...@8bytes.org Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com Reviewed-by: Sudeep Dutt sudeep.d...@intel.com Signed-off-by: Harish Chegondi harish.chego...@intel.com Where is this going to be used outside of the IOMMU world? ...and how does it relate to the patches from Sakari (+CC) doing much the same thing[1]? I merged Sakari's patches into the intel-iommu git tree today, FWIW. If there's really a need to move it from drivers/iommu/ to lib/ then we could feasibly do that too. -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/5] iommu/arm-smmu: Clean up DMA API usage
On 28/07/15 11:18, Will Deacon wrote: On Mon, Jul 27, 2015 at 07:18:10PM +0100, Robin Murphy wrote: With the correct DMA API calls now integrated into the io-pgtable code, let that handle the flushing of non-coherent page table updates. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/arm-smmu.c | 23 +++ 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 9a59bef..9501668 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -610,24 +610,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size, static void arm_smmu_flush_pgtable(void *addr, size_t size, void *cookie) { struct arm_smmu_domain *smmu_domain = cookie; - struct arm_smmu_device *smmu = smmu_domain-smmu; - unsigned long offset = (unsigned long)addr ~PAGE_MASK; - - /* Ensure new page tables are visible to the hardware walker */ - if (smmu-features ARM_SMMU_FEAT_COHERENT_WALK) { + /* +* Ensure new page tables are visible to a coherent hardware walker. +* The page table code deals with flushing for the non-coherent case. +*/ + if (smmu_domain-smmu-features ARM_SMMU_FEAT_COHERENT_WALK) dsb(ishst); - } else { - /* -* If the SMMU can't walk tables in the CPU caches, treat them -* like non-coherent DMA since we need to flush the new entries -* all the way out to memory. There's no possibility of -* recursion here as the SMMU table walker will not be wired -* through another SMMU. -*/ - dma_map_page(smmu-dev, virt_to_page(addr), offset, size, -DMA_TO_DEVICE); - } } static struct iommu_gather_ops arm_smmu_gather_ops = { @@ -899,6 +888,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .oas= oas, .tlb= arm_smmu_gather_ops, }; + if (!(smmu-features ARM_SMMU_FEAT_COHERENT_WALK)) + pgtbl_cfg.iommu_dev = smmu-dev; Ideally, I'd like to set the dev unconditionally and kill the flush_pgtable callback. See my comments on the other patch. I'm not sure setting dev unconditionally would be right - we'd end up doing a load of busy work in the DMA API to decide to do nothing and never get a barrier out of it. We could have a bool coherent in the cfg to make the wmb() happen, but the mere absence of dev would do that job just as well (wmb() isn't going to need it to report anything, unlike the DMA API path); either way flush_pgtable can go. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/11] PCI: Fix ATS deadlock
Hi Bjorn, On Mon, Jul 20, 2015 at 07:13:49PM -0500, Bjorn Helgaas wrote: Bjorn Helgaas (11): iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth PCI: Allocate ATS struct during enumeration PCI: Embed ATS info directly into struct pci_dev PCI: Reduce size of ATS structure elements PCI: Rationalize pci_ats_queue_depth() error checking PCI: Inline the ATS setup code into pci_ats_init() PCI: Use pci_physfn() rather than looking up physfn by hand PCI: Clean up ATS error handling PCI: Move ATS declarations to linux/pci.h so they're all together PCI: Stop caching ATS Invalidate Queue Depth PCI: Remove pci_ats_enabled() So while you are at working on ATS, could we probably also introduce a way to globally disable ATS on a box from the kernel command line (pci=noats?)? ATS is basically a way for a device to tunnel through the IOMMU without access checking (because pre-translated requests are not checked again). For security reasons it would be good to have a way to disable ATS completly if desired. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] iommu/arm-smmu: Sort out coherency
Hi Will, On 28/07/15 10:43, Will Deacon wrote: Hi Robin, Cheers for doing this. Just a few comments (mainly in relation to being consistent with SMMUv3). Thanks! On Mon, Jul 27, 2015 at 07:18:09PM +0100, Robin Murphy wrote: Currently we detect the whether the SMMU has coherent page table walk we detect the whether? Er, yeah, looks like rain tomorrow afternoon :P capability from the IDR0.CTTW field, and base our cache maintenance decisions on that. In preparation for fixing the bogus DMA API usage, however, we need to ensure that the DMA API agrees about this, which necessitates deferring to the dma-coherent property in the device tree for the final say. As an added bonus, since systems exist where an external CTTW signal has been tied off incorrectly at integration, allowing DT to override it offers a neat workaround for coherency issues with such SMMUs. Signed-off-by: Robin Murphy robin.mur...@arm.com --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 drivers/iommu/arm-smmu.c | 17 ++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 0676050..04724fc 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -43,6 +43,10 @@ conditions. ** System MMU optional properties: +- dma-coherent : Presence or absence should correctly reflect whether + the SMMU translation table walk interface is + cache-coherent with the CPU. + Please copy the text from: Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt# and remove the bit about stream table accesses. Done, although since the resulting ...DMA operations made by the SMMU (page table walks)... is both redundant and horribly clunky it gets rewritten as ...page table walks made by the SMMU... - calxeda,smmu-secure-config-access : Enable proper handling of buggy implementations that always use secure access to SMMU configuration registers. In this case non-secure diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 4cd0c29..9a59bef 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1532,6 +1532,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) unsigned long size; void __iomem *gr0_base = ARM_SMMU_GR0(smmu); u32 id; + bool cttw_dt, cttw_reg; dev_notice(smmu-dev, probing hardware configuration...\n); dev_notice(smmu-dev, SMMUv%d with:\n, smmu-version); @@ -1571,10 +1572,20 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) dev_notice(smmu-dev, \taddress translation ops\n); } - if (id ID0_CTTW) { + /* +* In order for DMA API calls to work properly, we must defer to what +* the DT says about coherency, regardless of what the hardware claims. +* Fortunately, this also opens up a workaround for systems where the +* ID register value has ended up configured incorrectly. +*/ + cttw_dt = of_property_read_bool(smmu-dev-of_node, dma-coherent); Use of_dma_is_coherent(smmu-dev-of_node) *facepalm* + if (cttw_dt) smmu-features |= ARM_SMMU_FEAT_COHERENT_WALK; - dev_notice(smmu-dev, \tcoherent table walk\n); - } + cttw_reg = !!(id ID0_CTTW); + if (cttw_dt || cttw_reg) + dev_notice(smmu-dev, \t%scoherent table walk%s\n, + cttw_dt ? : no , + (cttw_dt == cttw_reg) ? : (overridden by DT)); Similarly here; I think we should just follow the message printed by the SMMUv3 driver but s/IDR0.COHACC/ID0.CTTW/. I note that the v3 driver stays silent as long as the DT and hardware agree - I'm reticent to entirely remove the original message here as it's proved rather useful (especially in debugging-over-email exchanges) to see at a glance from a boot log whether the driver thinks a particular SMMU is coherent or not (particularly with my SMMUv1 many-instances-in-the-same-system hat on). I can certainly make the override message more consistent, though. Robin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use
On Tue, Jul 28, 2015 at 04:39:23PM +0100, Robin Murphy wrote: On 28/07/15 11:17, Will Deacon wrote: + if (cfg-tlb-flush_pgtable) Why would you have both a dev and a flush callback? In which cases is the DMA API insufficient? a) Bisectability given existing users. You could still make it an if (dev) .. else if (flush_pgtable) ... though. Then we could put the wmb() in the if () clause after the DMA-api calls and remove the conditional once everybody has been ported over. +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, + struct io_pgtable_cfg *cfg, void *cookie) +{ + struct device *dev = cfg-iommu_dev; + + *ptep = pte; + + if (dev) + dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep), + sizeof(pte), DMA_TO_DEVICE); + if (cfg-tlb-flush_pgtable) + cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie); Could we kill the flush_pgtable callback completely and just stick in a dma_wmb() here? Ideally, we've have something like dma_store_release, which we could use to set the parent page table entry, but that's left as a future exercise ;) I couldn't convince myself that there wouldn't never be some weird case where an IOMMU driver still needs to do something funky for a coherent device, so I left it in. Given that the existing implementations are either dsb or nothing, however, I agree it may be worth taking out now and only bringing back later if proven necessary. Yeah, let's keep it as simple as we can and avoid giving people callbacks unless they actually need them. I would think we'd need at least wmb() though, since dma_wmb() only gives us a dmb on arm64; if the PTE is going from invalid to valid (i.e. no TLB involved), we'd have the normal cacheable write of the PTE potentially racing with an MMIO write after we return (the do DMA with this address command to the master) and I think we might need a dsb to avoid that - if the PTE write hasn't got far enough for the IOMMU to snoop it, the walk hits the stale invalid entry, aborts the incoming transaction and kills the device. Yes, you're right. I was in a CPU-centric mindset and forgot that we can't deal with transient faults when going from invalid - valid for a device mapping. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/5] iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains
From: Joerg Roedel jroe...@suse.de Since devices with IOMMUv2 functionality might be in the same group as devices without it, allow those devices in IOMMUv2 domains too. Otherwise attaching the group with the IOMMUv2 device to the domain will fail. Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/amd_iommu.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 89e6d4b..6d3dae9 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2164,15 +2164,17 @@ static int attach_device(struct device *dev, dev_data = get_dev_data(dev); if (domain-flags PD_IOMMUV2_MASK) { - if (!dev_data-iommu_v2 || !dev_data-passthrough) + if (!dev_data-passthrough) return -EINVAL; - if (pdev_iommuv2_enable(pdev) != 0) - return -EINVAL; + if (dev_data-iommu_v2) { + if (pdev_iommuv2_enable(pdev) != 0) + return -EINVAL; - dev_data-ats.enabled = true; - dev_data-ats.qdep= pci_ats_queue_depth(pdev); - dev_data-pri_tlp = pci_pri_tlp_required(pdev); + dev_data-ats.enabled = true; + dev_data-ats.qdep= pci_ats_queue_depth(pdev); + dev_data-pri_tlp = pci_pri_tlp_required(pdev); + } } else if (amd_iommu_iotlb_sup pci_enable_ats(pdev, PAGE_SHIFT) == 0) { dev_data-ats.enabled = true; @@ -2237,7 +2239,7 @@ static void detach_device(struct device *dev) __detach_device(dev_data); write_unlock_irqrestore(amd_iommu_devtable_lock, flags); - if (domain-flags PD_IOMMUV2_MASK) + if (domain-flags PD_IOMMUV2_MASK dev_data-iommu_v2) pdev_iommuv2_disable(to_pci_dev(dev)); else if (dev_data-ats.enabled) pci_disable_ats(to_pci_dev(dev)); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/5] iommu/amd: Use iommu core for passthrough mode
From: Joerg Roedel jroe...@suse.de Remove the AMD IOMMU driver implementation for passthrough mode and rely on the new iommu core features for that. Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/amd_iommu.c | 58 ++ drivers/iommu/amd_iommu_init.c | 10 +--- 2 files changed, 3 insertions(+), 65 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a57e9b7..89e6d4b 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -76,8 +76,6 @@ LIST_HEAD(hpet_map); * Domain for untranslated devices - only allocated * if iommu=pt passed on kernel cmd line. */ -static struct protection_domain *pt_domain; - static const struct iommu_ops amd_iommu_ops; static ATOMIC_NOTIFIER_HEAD(ppr_notifier); @@ -96,7 +94,7 @@ struct iommu_dev_data { struct protection_domain *domain; /* Domain the device is bound to */ u16 devid;/* PCI Device ID */ bool iommu_v2;/* Device can make use of IOMMUv2 */ - bool passthrough; /* Default for device is pt_domain */ + bool passthrough; /* Device is identity mapped */ struct { bool enabled; int qdep; @@ -116,7 +114,6 @@ struct iommu_cmd { struct kmem_cache *amd_iommu_irq_cache; static void update_domain(struct protection_domain *domain); -static int alloc_passthrough_domain(void); static int protection_domain_init(struct protection_domain *domain); / @@ -2221,15 +2218,6 @@ static void __detach_device(struct iommu_dev_data *dev_data) do_detach(head); spin_unlock_irqrestore(domain-lock, flags); - - /* -* If we run in passthrough mode the device must be assigned to the -* passthrough domain if it is detached from any other domain. -* Make sure we can deassign from the pt_domain itself. -*/ - if (dev_data-passthrough - (dev_data-domain == NULL domain != pt_domain)) - __attach_device(dev_data, pt_domain); } /* @@ -2287,7 +2275,7 @@ static int amd_iommu_add_device(struct device *dev) BUG_ON(!dev_data); - if (dev_data-iommu_v2) + if (iommu_pass_through || dev_data-iommu_v2) iommu_request_dm_for_dev(dev); /* Domains are initialized for this device - have a look what we ended up with */ @@ -2947,21 +2935,6 @@ out_err: return NULL; } -static int alloc_passthrough_domain(void) -{ - if (pt_domain != NULL) - return 0; - - /* allocate passthrough domain */ - pt_domain = protection_domain_alloc(); - if (!pt_domain) - return -ENOMEM; - - pt_domain-mode = PAGE_MODE_NONE; - - return 0; -} - static struct iommu_domain *amd_iommu_domain_alloc(unsigned type) { struct protection_domain *pdomain; @@ -3222,33 +3195,6 @@ static const struct iommu_ops amd_iommu_ops = { * */ -int __init amd_iommu_init_passthrough(void) -{ - struct iommu_dev_data *dev_data; - struct pci_dev *dev = NULL; - int ret; - - ret = alloc_passthrough_domain(); - if (ret) - return ret; - - for_each_pci_dev(dev) { - if (!check_device(dev-dev)) - continue; - - dev_data = get_dev_data(dev-dev); - dev_data-passthrough = true; - - attach_device(dev-dev, pt_domain); - } - - amd_iommu_stats_init(); - - pr_info(AMD-Vi: Initialized for Passthrough Mode\n); - - return 0; -} - /* IOMMUv2 specific functions */ int amd_iommu_register_ppr_notifier(struct notifier_block *nb) { diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index dbda9ae..a24495e 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2026,14 +2026,6 @@ static bool detect_ivrs(void) return true; } -static int amd_iommu_init_dma(void) -{ - if (iommu_pass_through) - return amd_iommu_init_passthrough(); - else - return amd_iommu_init_dma_ops(); -} - / * * AMD IOMMU Initialization State Machine @@ -2073,7 +2065,7 @@ static int __init state_next(void) init_state = ret ? IOMMU_INIT_ERROR : IOMMU_INTERRUPTS_EN; break; case IOMMU_INTERRUPTS_EN: - ret = amd_iommu_init_dma(); + ret = amd_iommu_init_dma_ops(); init_state = ret ? IOMMU_INIT_ERROR : IOMMU_DMA_OPS; break; case IOMMU_DMA_OPS: -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org
[PATCH 4/5] iommu/amd: Use swiotlb in passthrough mode
From: Joerg Roedel jroe...@suse.de In passthrough mode (iommu=pt) all devices are identity mapped. If a device does not support 64bit DMA it might still need remapping. Make sure swiotlb is initialized to provide this remapping. Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/amd_iommu.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 6d3dae9..e29baa6 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2282,12 +2282,10 @@ static int amd_iommu_add_device(struct device *dev) /* Domains are initialized for this device - have a look what we ended up with */ domain = iommu_get_domain_for_dev(dev); - if (domain-type == IOMMU_DOMAIN_IDENTITY) { + if (domain-type == IOMMU_DOMAIN_IDENTITY) dev_data-passthrough = true; - dev-archdata.dma_ops = nommu_dma_ops; - } else { + else dev-archdata.dma_ops = amd_iommu_dma_ops; - } out: iommu_completion_wait(iommu); @@ -2852,8 +2850,8 @@ int __init amd_iommu_init_api(void) int __init amd_iommu_init_dma_ops(void) { + swiotlb= iommu_pass_through ? 1 : 0; iommu_detected = 1; - swiotlb = 0; amd_iommu_stats_init(); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5] iommu/io-pgtable-arm: Allow appropriate DMA API use
On 28/07/15 11:17, Will Deacon wrote: [...] diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 4e46021..b93a60e 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte; static bool selftest_running = false; +static dma_addr_t __arm_lpae_dma(struct device *dev, void *pages) +{ + return phys_to_dma(dev, virt_to_phys(pages)); +} + +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, + struct io_pgtable_cfg *cfg, void *cookie) +{ + void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO); + struct device *dev = cfg-iommu_dev; + dma_addr_t dma; + + if (!pages) + return NULL; Missing newline here. Coding style flamewar, go! Or not, since I have neither the energy or inclination, so I'll go with the status quo. (Here we come and here we come and here we go etc.) + if (dev) { + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); + if (dma_mapping_error(dev, dma)) + goto out_free; + /* +* We depend on the IOMMU being able to work with any physical +* address directly, so if the DMA layer suggests it can't by +* giving us back some translation, that bodes very badly... +*/ + if (WARN(dma != __arm_lpae_dma(dev, pages), +Cannot accommodate DMA translation for IOMMU page tables\n)) Now that we have a struct device for the iommu, we could use dev_err to make this diagnostic more useful. Good point. + goto out_unmap; + } Missing newline again... + if (cfg-tlb-flush_pgtable) Why would you have both a dev and a flush callback? In which cases is the DMA API insufficient? a) Bisectability given existing users. b) When the device is coherent, the DMA API stuff should be a nop even if a device was provided, therefore some other synchronisation is needed. + cfg-tlb-flush_pgtable(pages, size, cookie); ... and here (yeah, pedantry, but consistency keeps this file easier to read). + return pages; + +out_unmap: + dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); +out_free: + free_pages_exact(pages, size); + return NULL; +} + +static void __arm_lpae_free_pages(void *pages, size_t size, + struct io_pgtable_cfg *cfg) +{ + struct device *dev = cfg-iommu_dev; + + if (dev) + dma_unmap_single(dev, __arm_lpae_dma(dev, pages), +size, DMA_TO_DEVICE); + free_pages_exact(pages, size); +} + +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, + struct io_pgtable_cfg *cfg, void *cookie) +{ + struct device *dev = cfg-iommu_dev; + + *ptep = pte; + + if (dev) + dma_sync_single_for_device(dev, __arm_lpae_dma(dev, ptep), + sizeof(pte), DMA_TO_DEVICE); + if (cfg-tlb-flush_pgtable) + cfg-tlb-flush_pgtable(ptep, sizeof(pte), cookie); Could we kill the flush_pgtable callback completely and just stick in a dma_wmb() here? Ideally, we've have something like dma_store_release, which we could use to set the parent page table entry, but that's left as a future exercise ;) I couldn't convince myself that there wouldn't never be some weird case where an IOMMU driver still needs to do something funky for a coherent device, so I left it in. Given that the existing implementations are either dsb or nothing, however, I agree it may be worth taking out now and only bringing back later if proven necessary. I would think we'd need at least wmb() though, since dma_wmb() only gives us a dmb on arm64; if the PTE is going from invalid to valid (i.e. no TLB involved), we'd have the normal cacheable write of the PTE potentially racing with an MMIO write after we return (the do DMA with this address command to the master) and I think we might need a dsb to avoid that - if the PTE write hasn't got far enough for the IOMMU to snoop it, the walk hits the stale invalid entry, aborts the incoming transaction and kills the device. diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h index 10e32f6..39fe864 100644 --- a/drivers/iommu/io-pgtable.h +++ b/drivers/iommu/io-pgtable.h @@ -41,6 +41,7 @@ struct iommu_gather_ops { * @ias: Input address (iova) size, in bits. * @oas: Output address (paddr) size, in bits. * @tlb: TLB management callbacks for this set of tables. + * @iommu_dev: The owner of the page table memory (for DMA purposes). */ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_ARM_NS (1 0) /* Set NS bit in PTEs */ @@ -49,6 +50,7 @@ struct io_pgtable_cfg { unsigned int
[PATCH 5/5] iommu/amd: Set global dma_ops if swiotlb is disabled
From: Joerg Roedel jroe...@suse.de Some AMD systems also have non-PCI devices which can do DMA. Those can't be handled by the AMD IOMMU, as the hardware can only handle PCI. These devices would end up with no dma_ops, as neither the per-device nor the global dma_ops will get set. SWIOTLB provides global dma_ops when it is active, so make sure there are global dma_ops too when swiotlb is disabled. Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/amd_iommu.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index e29baa6..fa9508b 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2853,6 +2853,15 @@ int __init amd_iommu_init_dma_ops(void) swiotlb= iommu_pass_through ? 1 : 0; iommu_detected = 1; + /* +* In case we don't initialize SWIOTLB (actually the common case +* when AMD IOMMU is enabled), make sure there are global +* dma_ops set as a fall-back for devices not handled by this +* driver (for example non-PCI devices). +*/ + if (!swiotlb) + dma_ops = nommu_dma_ops; + amd_iommu_stats_init(); if (amd_iommu_unmap_flush) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/5] iommu/amd: Use iommu_attach_group()
From: Joerg Roedel jroe...@suse.de Since the conversion to default domains the iommu_attach_device function only works for devices with their own group. But this isn't always true for current IOMMUv2 capable devices, so use iommu_attach_group instead. Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/amd_iommu_v2.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 3465faf..f7b875b 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -132,11 +132,19 @@ static struct device_state *get_device_state(u16 devid) static void free_device_state(struct device_state *dev_state) { + struct iommu_group *group; + /* * First detach device from domain - No more PRI requests will arrive * from that device after it is unbound from the IOMMUv2 domain. */ - iommu_detach_device(dev_state-domain, dev_state-pdev-dev); + group = iommu_group_get(dev_state-pdev-dev); + if (WARN_ON(!group)) + return; + + iommu_detach_group(dev_state-domain, group); + + iommu_group_put(group); /* Everything is down now, free the IOMMUv2 domain */ iommu_domain_free(dev_state-domain); @@ -731,6 +739,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid); int amd_iommu_init_device(struct pci_dev *pdev, int pasids) { struct device_state *dev_state; + struct iommu_group *group; unsigned long flags; int ret, tmp; u16 devid; @@ -776,10 +785,16 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids) if (ret) goto out_free_domain; - ret = iommu_attach_device(dev_state-domain, pdev-dev); - if (ret != 0) + group = iommu_group_get(pdev-dev); + if (!group) goto out_free_domain; + ret = iommu_attach_group(dev_state-domain, group); + if (ret != 0) + goto out_drop_group; + + iommu_group_put(group); + spin_lock_irqsave(state_lock, flags); if (__get_device_state(devid) != NULL) { @@ -794,6 +809,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids) return 0; +out_drop_group: + iommu_group_put(group); + out_free_domain: iommu_domain_free(dev_state-domain); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/5] AMD IOMMU Fixes for v4.2-rc4
Hi, here are a couple of fixes for the AMD IOMMU driver for issues found recently. The issues were introduced by the default-domain conversion in this development cycle. I plan to send these fixes upstream by the end of the week. Please review! Thanks, Joerg Joerg Roedel (5): iommu/amd: Use iommu_attach_group() iommu/amd: Use iommu core for passthrough mode iommu/amd: Allow non-IOMMUv2 devices in IOMMUv2 domains iommu/amd: Use swiotlb in passthrough mode iommu/amd: Set global dma_ops if swiotlb is disabled drivers/iommu/amd_iommu.c | 91 +++--- drivers/iommu/amd_iommu_init.c | 10 + drivers/iommu/amd_iommu_v2.c | 24 +-- 3 files changed, 45 insertions(+), 80 deletions(-) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 0/5] iommu/arm-smmu: add support for non-pci devices
On Tue, Jul 21, 2015 at 11:30:07AM +0100, Robin Murphy wrote: On 21/07/15 08:30, Zhen Lei wrote: Changelog: v2 - v3: 1. add support for pci device hotplug, which missed in patch v2. 2. only support #iommu-cells = 1, add corresponding description in arm,smmu-v3.txt. 3. add function find_smmu_by_device which extracted from find_smmu_by_node, to resolve the problem mentioned by Robin Murphy in [PATCH v2 7/9]. Additionally: +platform_set_drvdata(pdev, smmu); //Patch v2 + dev-archdata.iommu = smmu; //Patch v3, dev = pdev-dev I didn't give any Reviewed-by tags, much less to revised patches that I've not even looked at yet; please see section 13 of Documentation/SubmittingPatches for what the Reviewed-by tag means. That said, it would be great if you could review the patches anyway ;) Also, it looks like this is all based on Laurent's IOMMU probe deferral series which hasn't seen any movement for a while. Laurent: are you planning to repost that, or is it worth us picking up the original version and bashing it into shape ourselves? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
Hi Joerg, Robin, On Mon, Jul 20, 2015 at 06:23:35PM +0100, Robin Murphy wrote: On 20/07/15 16:26, Joerg Roedel wrote: On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote: arch/arm64/Kconfig | 1 + arch/arm64/include/asm/dma-mapping.h | 15 +- arch/arm64/mm/dma-mapping.c | 452 + What happened to the plan to merge this with the existing iommu-based dma-api implementation for 32 bit ARM? The issue currently is that there are a bunch of drivers using the exported arm_iommu_* functions directly. From what I can tell, they seem like they could probably all be converted to using default domains and/or the new domain type abstractions via the core IOMMU API, which would then allow killing off dma_iommu_mapping and rewriting the arch/arm implementation to use the new shared code. I don't currently have any 32-bit platform to test with, so I'm a little dubious of taking that all on myself right now. In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform device support, the Mediatek M4U patches and my own SMMUv2 work, so it would be very useful to get the arm64 and common code in as a first step, then look at cleaning up arch/arm for 4.4 without dangling dependencies. What's the plan with this? Do we need to port 32-bit ARM before it's a candidate for merging, or can we push ahead with getting this up and running for arm64 (which currently can't use an IOMMU for DMA)? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 3/6] iommu: add ARM short descriptor page table allocator.
On Tue, Jul 28, 2015 at 06:08:14AM +0100, Yong Wu wrote: On Mon, 2015-07-27 at 15:11 +0100, Will Deacon wrote: On Mon, Jul 27, 2015 at 03:05:38PM +0100, Robin Murphy wrote: On 27/07/15 05:21, Yong Wu wrote: + } else {/* page or largepage */ + if (quirk IO_PGTABLE_QUIRK_SHORT_MTK) { + if (large) { /* special Bit */ This definitely needs a better comment! What exactly are you doing here and what is that quirk all about? I use this quirk is for MTK Special Bit as we don't have the XN bit in pagetable. I'm still not really clear about what this is. There is some difference between the standard spec and MTK HW, Our hw don't implement some bits, like XN and AP. So I add a quirk for MTK special. When you say it doesn't implement these bits, do you mean that having them set will lead to Bad Things happening in the hardware, or that it will simply ignore them and not enforce any of the protections they imply? The former case would definitely want clearly documenting somewhere, whereas for the latter case I'm not sure it's even worth the complication of having a quirk - if the value doesn't matter there seems little point in doing a special dance just for the sake of semantic correctness of the in-memory PTEs, in my opinion. Agreed. We should only use quirks if the current (architecturally compliant) code causes real issues with the hardware. Then the quirk can be used to either avoid the problematic routines or to take extra steps to make things work as the architecture intended. I've asked how this IOMMU differs from the architecture on a number of occasions, but I'm still yet to receive a response other than it's special. After check further with DE, Our pagetable is refer to ARM-v7's short-descriptor which is a little different from ARM-v8. like bit0(PXN) in section and supersection, I didn't read ARM-v7 spec before, so I add a MTK quirk to disable PXN bit in section and supersection.(if the PXN bit is wrote in ARM-v7 spec, HW will page fault.) I've been reviewing this using the ARMv7 ARM (Rev.C of DDI0406C) the whole time. PXN is there as an optional field in non-LPAE implementations. That's fine and doesn't require any quirks. Then I write this code according to ARM-v8 spec defaultly, and add a ARM-v7 quirk? No, I don't think you need this, as the v8 and v7 short-descriptor formats look compatible to me. You should only need a quirk if architecturally compliant code cannot work on your hardware. And there is a little different between ARM-v7 spec and MTK pagetable. It's the XN(bit0) in small page. MTK don't implement XN bit. The bit[1:0] in MTK's small page should be 2'b10, if it's 2'b11, HW will page fault. Aha, thanks! *That* is worthy of a quirk. Something like: IO_PGTABLE_QUIRK_ARM_NO_XN (MTK don't implement AP bits too, but HW don't use them, it is ok even though AP bits is wrote) Yeah, I think that's fine. The pgtable code will honour the request but the h/w will ignore it. In the end, I will add two quirk like this, is it OK? I think you only need the one I mentioned above. I don't see the need for PXN at all (as I said in the last review). Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/6] iommu/mediatek: Add mt8173 IOMMU driver
On Mon, 2015-07-27 at 16:49 +0100, Robin Murphy wrote: On 27/07/15 16:31, Russell King - ARM Linux wrote: On Mon, Jul 27, 2015 at 02:23:26PM +0100, Robin Murphy wrote: On 16/07/15 10:04, Yong Wu wrote: This patch adds support for mediatek m4u (MultiMedia Memory Management Unit). Signed-off-by: Yong Wu yong...@mediatek.com [...] +static void mtk_iommu_flush_pgtable(void *ptr, size_t size, void *cookie) +{ + struct mtk_iommu_domain *domain = cookie; + unsigned long offset = (unsigned long)ptr ~PAGE_MASK; + + dma_map_page(domain-data-dev, virt_to_page(ptr), offset, +size, DMA_TO_DEVICE); Nit: this looks like it may as well be dma_map_single. It would probably be worth following it with a matching unmap too, just to avoid any possible leakage bugs (especially if this M4U ever appears in a SoC supporting RAM above the 32-bit boundary). Why not do the job properly? Take a look at how I implemented the streaming DMA API on Tegra SMMU (patch set recently sent out earlier today). There's no need for hacks like dma_map_page() (and discarding it's return value) or dma_map_page() followed by dma_unmap_page(). Indeed, as it happens I do have a branch where I prototyped that for the long-descriptor io-pgtable-arm code a while ago; this discussion has prompted me to dig it up again. Stay tuned, folks... Hi Russell, Robin, From I see in arm-smmu-v3.c in v4.2-rc1, The flush_pgtable seems like this: //== dma_addr_t dma_addr; dma_addr = dma_map_page(dev, ptr, size, DMA_TO_DEVICE); if (dma_mapping_error(dev, dma_addr)) dev_err(dev, failed to flush pgtable at %p\n, ptr); else dma_unmap_page(dev, dma_addr, size, DMA_TO_DEVICE); //== I will change map like this and use dma_map_single instead. Is this also seems to be not proper? Then how to do it? add this before unmap? : dma_sync_single_for_device(dev, dma_addr, size, DMA_TO_DEVICE); Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping
On Thu, Jul 16, 2015 at 07:40:13PM +0100, Robin Murphy wrote: +/** + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space + * @dev: Device to allocate memory for. Must be a real device + *attached to an iommu_dma_domain + * @size: Size of buffer in bytes + * @gfp: Allocation flags + * @prot: IOMMU mapping flags + * @handle: Out argument for allocated DMA handle + * @flush_page: Arch callback to flush a single page from all caches to + * ensure DMA visibility of __GFP_ZERO. May be NULL if not + * required. + * + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated, + * but an IOMMU which supports smaller pages might not map the whole thing. + * + * Return: Array of struct page pointers describing the buffer, + * or NULL on failure. + */ +struct page **iommu_dma_alloc(struct device *dev, size_t size, + gfp_t gfp, int prot, dma_addr_t *handle, + void (*flush_page)(const void *, phys_addr_t)) +{ + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); + struct iova_domain *iovad = domain-dma_api_cookie; + struct iova *iova; + struct page **pages; + struct sg_table sgt; + dma_addr_t dma_addr; + unsigned int count = PAGE_ALIGN(size) PAGE_SHIFT; + + *handle = DMA_ERROR_CODE; + + pages = __iommu_dma_alloc_pages(count, gfp); + if (!pages) + return NULL; + + iova = __alloc_iova(iovad, size, dev-coherent_dma_mask); + if (!iova) + goto out_free_pages; + + size = iova_align(iovad, size); + if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL)) + goto out_free_iova; + + if (gfp __GFP_ZERO) { + struct sg_mapping_iter miter; + /* + * The flushing provided by the SG_MITER_TO_SG flag only + * applies to highmem pages, so it won't do the job here. + */ The comment here is slightly wrong. The SG_MITER_FROM_SG flushing is done by calling flush_kernel_dcache_page(). This function can be no-op even on architectures implementing highmem. For example, on arm32 it only does some flushing if there is a potential D-cache alias with user space. The flush that you call below is for DMA operations, something entirely different. + sg_miter_start(miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG); + while (sg_miter_next(miter)) { + memset(miter.addr, 0, PAGE_SIZE); Isn't __GFP_ZERO already honoured by alloc_pages in __iommu_dma_alloc_pages? + if (flush_page) + flush_page(miter.addr, page_to_phys(miter.page)); Could you instead do the check as !(prot IOMEM_CACHE) so that it saves architecture code from passing NULL when coherent? I'm still not convinced we need this flushing here but I'll follow up in patch 3/4. -- Catalin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library
On Tue, 2015-07-28 at 15:38 +0100, David Woodhouse wrote: On Tue, 2015-07-28 at 11:41 +0100, Robin Murphy wrote: On 28/07/15 11:03, Joerg Roedel wrote: On Mon, Jul 27, 2015 at 04:57:32PM -0700, Ashutosh Dixit wrote: From: Harish Chegondi harish.chego...@intel.com This patch converts iova.c into a library, moving it from drivers/iommu/ to lib/, and exports its virtual address allocation and management functions so that other modules can reuse them. Cc: Joerg Roedel j...@8bytes.org Reviewed-by: Anil S Keshavamurthy anil.s.keshavamur...@intel.com Reviewed-by: Sudeep Dutt sudeep.d...@intel.com Signed-off-by: Harish Chegondi harish.chego...@intel.com Where is this going to be used outside of the IOMMU world? We are using the IOVA generator in the SCIF driver posted @ http://thread.gmane.org/gmane.linux.kernel/2005895 under drivers/misc/mic/scif ...and how does it relate to the patches from Sakari (+CC) doing much the same thing[1]? The patch series from Sakari does the right thing by moving the IOVA cache management to the IOVA library. We will simply drop this current patch as it is incorrect. I merged Sakari's patches into the intel-iommu git tree today, FWIW. If there's really a need to move it from drivers/iommu/ to lib/ then we could feasibly do that too. The patch series from Sakari should work perfectly for us. We will post a v2 of the current SCIF patch series without this IOVA patch and modify the SCIF driver to use the newly added iova_cache_get(..) and iova_cache_put(..) APIs once it is available in Linus's tree. It would make it easier for us to integrate if Sakari's patches reach mainline soon. It might be cleaner to move IOVA to lib/ in the longer term since we will have multiple driver subsystems using it, but it should work just fine for now. Thanks for the review! Sudeep Dutt ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/4] arm64: Add IOMMU dma_ops
On Thu, Jul 16, 2015 at 07:40:14PM +0100, Robin Murphy wrote: +static void *__iommu_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, + struct dma_attrs *attrs) +{ + bool coherent = is_device_dma_coherent(dev); + int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent); + void *addr; + + if (WARN(!dev, cannot create IOMMU mapping for unknown device\n)) + return NULL; + + /* + * Some drivers rely on this, and we may not want to potentially + * expose stale kernel data to devices anyway. + */ + gfp |= __GFP_ZERO; + + if (gfp __GFP_WAIT) { + struct page **pages; + pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) : + __pgprot(PROT_NORMAL_NC); + + pgprot = __get_dma_pgprot(attrs, pgprot, coherent); I think you can simplify this: pgprot_t pgprot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent); (and we could do the same in __dma_alloc) + pages = iommu_dma_alloc(dev, size, gfp, ioprot, handle, + coherent ? NULL : flush_page); I commented on patch 2/4. If we checked for IOMMU_CACHE in iommu_dma_alloc(), we could always pass flush_page here without the NULL. + if (!pages) + return NULL; + + addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot, + __builtin_return_address(0)); + if (!addr) + iommu_dma_free(dev, pages, size, handle); For arm64, it would have been easier to simply flush the caches here and avoid the flush_page argument. However, I reread your earlier reply, so if we ever honour the ATTR_NO_KERNEL_MAPPING, then we would have to iterate over the individual pages in the arch code. Let's leave it as per your patch 2/4 (with a flush_page argument). -- Catalin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops
On Thu, Jul 16, 2015 at 07:40:15PM +0100, Robin Murphy wrote: With iommu_dma_ops in place, hook them up to the configuration code, so IOMMU-fronted devices will get them automatically. Signed-off-by: Robin Murphy robin.mur...@arm.com Acked-by: Catalin Marinas catalin.mari...@arm.com ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 1/1] iommu: Detach device from domain when removed from group
This patch adds a call to __iommu_detach_device() to the iommu_group_remove_device() function, which will trigger a missing detach_dev callback in (at least) the following scenario: When a user completes the VFIO_SET_IOMMU ioctl for a vfio-pci device, and the corresponding device is removed thereafter (before any other ioctl like VFIO_GROUP_GET_DEVICE_FD), then the detach_dev callback of the underlying IOMMU API is never called. This also fixes an asymmetry with iommu_group_add_device() and iommu_group_remove_device(), where the former did an attach_dev but the latter did no detach_dev. Signed-off-by: Gerald Schaefer gerald.schae...@de.ibm.com --- drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index f286090..82ac8b3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -447,6 +447,9 @@ rename: } EXPORT_SYMBOL_GPL(iommu_group_add_device); +static void __iommu_detach_device(struct iommu_domain *domain, + struct device *dev); + /** * iommu_group_remove_device - remove a device from it's current group * @dev: device to be removed @@ -466,6 +469,8 @@ void iommu_group_remove_device(struct device *dev) IOMMU_GROUP_NOTIFY_DEL_DEVICE, dev); mutex_lock(group-mutex); + if (group-domain) + __iommu_detach_device(group-domain, dev); list_for_each_entry(tmp_device, group-devices, list) { if (tmp_device-dev == dev) { device = tmp_device; -- 2.3.8 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 0/1] iommu: Detach device from domain when removed from group
Hi, during IOMMU API function testing on s390 I hit the following scenario: After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU ioctl and stops, see the sample C program below. Now the device is manually removed via echo 1 /sys/bus/pci/devices/.../remove. Although the SET_IOMMU ioctl triggered the attach_dev callback in the underlying IOMMU API, removing the device in this way won't trigger the detach_dev callback, neither during remove nor when the user program continues with closing group/container. On s390, this eventually leads to a kernel panic when binding the device again to its non-vfio PCI driver, because of the missing arch-specific cleanup in detach_dev. On x86, the detach_dev callback will also not be called directly, but there is a notifier that will catch BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other architectures w/o the notifier probably have at least some kind of memory leak in this scenario, so a general fix would be nice. My first approach was to try and fix this in VFIO code, but Alex Williamson pointed me to some asymmetry in the IOMMU code: iommu_group_add_device() will invoke the attach_dev callback, but iommu_group_remove_device() won't trigger detach_dev. Fixing this asymmetry would fix the issue for me, but is this the correct fix? Any thoughts? Regards, Gerald Here is the sample C program to trigger the ioctl: #include stdio.h #include fcntl.h #include linux/vfio.h int main(void) { int container, group, rc; container = open(/dev/vfio/vfio, O_RDWR); if (container 0) { perror(open /dev/vfio/vfio\n); return -1; } group = open(/dev/vfio/0, O_RDWR); if (group 0) { perror(open /dev/vfio/0\n); return -1; } rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, container); if (rc) { perror(ioctl VFIO_GROUP_SET_CONTAINER\n); return -1; } rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU); if (rc) { perror(ioctl VFIO_SET_IOMMU\n); return -1; } printf(Try device remove...\n); getchar(); close(group); close(container); return 0; } Gerald Schaefer (1): iommu: Detach device from domain when removed from group drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) -- 2.3.8 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/omap: Add support for configuring dsp iommus on DRA7xx
* Suman Anna s-a...@ti.com [150724 09:27]: Hi Tony, On 07/23/2015 11:30 PM, Tony Lindgren wrote: * Suman Anna s-a...@ti.com [150723 09:25]: Hi Tony, On 07/23/2015 02:24 AM, Tony Lindgren wrote: * Suman Anna s-a...@ti.com [150722 09:25]: On 07/22/2015 12:26 AM, Tony Lindgren wrote: I don't like using syscon for tinkering directly with SoC registers. This is not a SoC-level register, but a register within a sub-module of the DSP processor sub-system. The DSP_SYSTEM sub-module in general is described in Section 5.3.3 of the TRM [1], and it implements different functionalities like the PRCM handshaking, wakeup logic and DSP subsystem top-level configuration. It is a module present within the DSP processor sub-system, so can only be accessed when the sub-system is clocked and the appropriate reset is released. OK so if it's specific to the DSP driver along the lines of sysc and syss registers. There will be those registers too within the MMU config register space, even for DRA7xx MMUs. This is different, think of it like a register in the Control module except that it is present within the DSP sub-system instead of at the SoC level. And what is taking care of pm_runtime_get here to ensure the module is powered and clocked? pm_runtime_get_sync is indeed getting invoked, its just the current patch doesn't include the code block that invokes it. The function that invokes omap2_iommu_enable does so after a call to the pm_runtime_get_sync() call. OK I think you are missing a layer here and it's the Linux kernel side device driver for DSP that initializes things. We already have separate drivers for MMUs (omap-iommu) and the processor management (omap-rproc). The former manages all the low-level programming sequences for the MMUs, while the latter manages the low-level reset etc, and is a client user of the respective IOMMU device. Both integrate into the respective frameworks. The IOMMU API invocations are handled in the remoteproc core, with the OMAP remoteproc driver publishing itself as having an MMU with the remoteproc core. The IOMMU API invoke the appropriate iommu_ops. You can lookup the functions rproc_enable_iommu()/rproc_disable_iommu() in the remoteproc core. The IOMMU enabling sequences happen within the iommu_attach_device() API. The call flow is iommu_attach_device()-omap_iommu_attach_dev()-omap_iommu_attach()- iommu_enable()- omap_device_deassert_hardreset, pm_runtime_get_sync, omap2_iommu_enable. OK. The thing to check here is that you have a separate device driver for each sysc/syss device registers. This is because each hardware module can be independently clocked and idled. Otherwise things will break at some point. And no things are configured for autoidle or we're not doing PM is not a good excuse here :) Can you please check that? If the remoteproc driver and iommu driver are tinkering with registers in separate hardware modules, we have a layering violation. My guess is that we have at least two hardware modules involved here, one for the iommu and one for the DSP device. Typically we handle these registers by mapping them to the PM runtime functions for the interconnect so we can reset and idle the hardware modules even if there is no driver, see for example omap54xx_mmu_dsp_hwmod. I haven't yet submitted the DRA7xx hwmods, but they will look almost exactly like the OMAP5 ones. The reset and idle on these are in general not effective at boot time since these are also controlled by a hard-reset line, so that's left to the drivers to deal with it through the omap_device_deassert_hardreset API. If the MMU configuration is one time init, it may make sense to add it to the hwmod reset function. However, if the Linux kernel side driver needs to configure things depending on the DSP firmware, it should be done in the kernel side device driver. The MMU configuration comes into picture whenever the remoteproc driver is being booted and shut down, and also during suspend (no support for this yet in mainline on OMAP rproc). Today, the hwmod _enable/_idle/reset functions alone are not enough to power on the OMAP remoteproc/iommus. We need sequencing calls to both omap_device_assert/_deassert_hardreset (done through pdata-quirks) and pm_runtime API to achieve this. Right and that's why I'm worried that we have multiple hardware modules involved :) We should use some Linux generic framework for configuring these bits to avoid nasty dependencies between various hardware modules on the SoC. What does DSP_SYS_MMU_CONFIG register do? It seems it's probably a regulator or a gate clock? If so, it should be set up as a regulator or a clock and then the omap-iommu driver can just use regulator or clcok framework to request the resource. No, its neither. It is a control bit that dictates whether the processor/EDMA addresses go through the respective MMU
Re: [PATCH char-misc-next 10/19] lib: convert iova.c into a library
On Mon, 27 Jul 2015 16:57:32 -0700 Ashutosh Dixit ashutosh.di...@intel.com wrote: From: Harish Chegondi harish.chego...@intel.com This patch converts iova.c into a library, moving it from drivers/iommu/ to lib/, and exports its virtual address allocation and management functions so that other modules can reuse them. From the following emails it is unclear to me whether we'll actually be going ahead with this, but whatever. It's a chance to do some code reading. {drivers/iommu = lib}/iova.c | 9 + Except the code isn't here. Stupid git. Here 'tis: /* * Copyright __ 2006-2009, Intel Corporation. Non-ascii thing which I can't get to display correctly in anything. Give it up, it'll never work, use (c). * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, * version 2, as published by the Free Software Foundation. * * This program is distributed in the hope it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for * more details. * * You should have received a copy of the GNU General Public License along with * this program; if not, write to the Free Software Foundation, Inc., 59 Temple * Place - Suite 330, Boston, MA 02111-1307 USA. * * Author: Anil S Keshavamurthy anil.s.keshavamur...@intel.com */ #include linux/iova.h #include linux/slab.h Wow. This file only compiles by sheer dumb luck. kernel.h? spinlock.h? bitops.h? Some of these things come in thanks to iova.h, others don't. static struct kmem_cache *iommu_iova_cache; int iommu_iova_cache_init(void) Should actually be called iommu_iova_cache_create(). Because it's a wrapper around kmem_cache_create(), and create is what it does. { int ret = 0; iommu_iova_cache = kmem_cache_create(iommu_iova, sizeof(struct iova), 0, SLAB_HWCACHE_ALIGN, NULL); Could use KMEM_CACHE(). if (!iommu_iova_cache) { pr_err(Couldn't create iova cache\n); ret = -ENOMEM; } return ret; } void iommu_iova_cache_destroy(void) The naming throughout this driver is a mess. Sometimes it's iommu_iova_operation. Other times it's operation_iova. Other times it's operation_iova_operation. There's no thought and no consistency. The usual standard will work well here. Stick with iova_operation and use it consistently. So iova_cache_create iova_cache_destroy iova_mem_free iova_init_domain iova_alloc iova_find iova_etcetera { kmem_cache_destroy(iommu_iova_cache); } struct iova *alloc_iova_mem(void) { return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC); } GFP_ATOMIC is unreliable and undesirably depletes page allocator reserves. If it is really really the case that no caller can ever allocate an iova in any other context then OK, but that's an extraordinary thing and I suggest it should be well documented in code comments. However I suspect the thing to do here is to convert this to take a gfp_t argument to permit callers to use the stronger memory allocation strategies. void free_iova_mem(struct iova *iova) { kmem_cache_free(iommu_iova_cache, iova); } void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn, unsigned long pfn_32bit) { /* * IOVA granularity will normally be equal to the smallest * supported IOMMU page size; both *must* be capable of * representing individual CPU pages exactly. */ BUG_ON((granule PAGE_SIZE) || !is_power_of_2(granule)); spin_lock_init(iovad-iova_rbtree_lock); iovad-rbroot = RB_ROOT; iovad-cached32_node = NULL; iovad-granule = granule; iovad-start_pfn = start_pfn; iovad-dma_32bit_pfn = pfn_32bit; } static struct rb_node * __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn) { if ((*limit_pfn != iovad-dma_32bit_pfn) || (iovad-cached32_node == NULL)) return rb_last(iovad-rbroot); else { struct rb_node *prev_node = rb_prev(iovad-cached32_node); struct iova *curr_iova = container_of(iovad-cached32_node, struct iova, node); *limit_pfn = curr_iova-pfn_lo - 1; return prev_node; } } static void __cached_rbnode_insert_update(struct iova_domain *iovad, unsigned long limit_pfn, struct iova *new) { if (limit_pfn != iovad-dma_32bit_pfn) return; iovad-cached32_node = new-node; } static void __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) {