Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
On 01/12/15 at 04:00pm, Li, ZhenHua wrote: Comparing to v7, this version adds only a few lines code: In function copy_page_table, + __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next), + VTD_PAGE_SIZE); So this adding fixs the reported dmar fault on Takao's system, right? On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table and page table, copy data from the old ones that used by the old kernel. 5. to use different portions of the iova address ranges for the device drivers in the crashdump kernel than the iova ranges that were in-use at the time of the panic. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and page table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 Changelog[v8]: 1. Add a missing __iommu_flush_cache in function copy_page_table. Changelog[v7]: 1. Use __iommu_flush_cache to flush the data to hardware. Changelog[v6]: 1. Use unsigned long as type of physical address. 2. Use new function unmap_device_dma to unmap the old dma. 3. Some small incorrect bits order for aw shift. Changelog[v5]: 1. Do not disable and re-enable traslation and interrupt remapping. 2. Use old root entry table. 3. Use old interrupt remapping
Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
On 01/12/2015 05:07 PM, Baoquan He wrote: On 01/12/15 at 04:00pm, Li, ZhenHua wrote: Comparing to v7, this version adds only a few lines code: In function copy_page_table, + __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next), + VTD_PAGE_SIZE); So this adding fixs the reported dmar fault on Takao's system, right? I am not sure whether it can fix the dmar fault on Takao's system, but I hope it can fix. On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table and page table, copy data from the old ones that used by the old kernel. 5. to use different portions of the iova address ranges for the device drivers in the crashdump kernel than the iova ranges that were in-use at the time of the panic. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and page table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 Changelog[v8]: 1. Add a missing __iommu_flush_cache in function copy_page_table. Changelog[v7]: 1. Use __iommu_flush_cache to flush the data to hardware. Changelog[v6]: 1. Use unsigned long as type of physical address. 2. Use new function unmap_device_dma to unmap the old dma. 3. Some small incorrect bits order for aw shift. Changelog[v5]: 1. Do not disable and re-enable traslation and interrupt remapping. 2. Use old root entry table. 3.
Re: [PATCH v8 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
Comparing to v7, this version adds only a few lines code: In function copy_page_table, + __iommu_flush_cache(iommu, phys_to_virt(dma_pte_next), + VTD_PAGE_SIZE); On 01/12/2015 03:06 PM, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table and page table, copy data from the old ones that used by the old kernel. 5. to use different portions of the iova address ranges for the device drivers in the crashdump kernel than the iova ranges that were in-use at the time of the panic. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and page table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 Changelog[v8]: 1. Add a missing __iommu_flush_cache in function copy_page_table. Changelog[v7]: 1. Use __iommu_flush_cache to flush the data to hardware. Changelog[v6]: 1. Use unsigned long as type of physical address. 2. Use new function unmap_device_dma to unmap the old dma. 3. Some small incorrect bits order for aw shift. Changelog[v5]: 1. Do not disable and re-enable traslation and interrupt remapping. 2. Use old root entry table. 3. Use old interrupt remapping table. 4. New functions to copy data from old kernel, and save to old kernel mem. 5. New functions to save updated root entry table and irte table. 6. Use intel_unmap to unmap the old dma; 7. Allocate new
Re: [PATCH RESEND] dma-mapping: tidy up dma_parms default handling
On 09/01/15 19:45, Arnd Bergmann wrote: On Friday 09 January 2015 16:56:03 Robin Murphy wrote: This one's a bit tricky to find a home for - I think technically it's probably an IOMMU patch, but then the long-underlying problem doesn't seem to have blown up anything until arm64, and my motivation is to make bits of Juno work, which seems to nudge it towards arm64/arm-soc territory. Could anyone suggest which tree is most appropriate? I have a set of patches touching various dma-mapping.h related bits across architectures and in ARM in particular. Your patch fits into that series, and I guess we could either have it in my asm-generic tree or in Andrew Morton's mm tree. Possibly also arm-soc for practical reasons, although it really doesn't belong in there. Thanks Arnd, I'd agree asm-generic or mm sound the most sensible - If you're happy to carry this patch with your series that'd be really helpful. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 01/10] iommu/vt-d: Update iommu_attach_domain() and its callers
On Mon, Jan 12, 2015 at 03:06:19PM +0800, Li, Zhen-Hua wrote: Allow specification of the domain-id for the new domain. This patch only adds the 'did' parameter to iommu_attach_domain() and modifies all of its callers to specify the default value of -1 which says no did specified, allocate a new one. I think its better to keep the old iommu_attach_domain() interface in place and introduce a new function (like iommu_attach_domain_with_id() or something) which has the additional parameter. Then you can rewrite iommu_attach_domain(): iommu_attach_domai(...) { return iommu_attach_domain_with_id(..., -1); } This way you don't have to update all the callers of iommu_attach_domain() and the interface is more readable. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump
On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote: + +#ifdef CONFIG_CRASH_DUMP + +/* + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU + * + * Fixes the crashdump kernel to deal with an active iommu and legacy + * DMA from the (old) panicked kernel in a manner similar to how legacy + * DMA is handled when no hardware iommu was in use by the old kernel -- + * allow the legacy DMA to continue into its current buffers. + * + * In the crashdump kernel, this code: + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA). + * 2. Do not re-enable IOMMU's translating. + * 3. In kdump kernel, use the old root entry table. + * 4. Leaves the current translations in-place so that legacy DMA will + *continue to use its current buffers. + * 5. Allocates to the device drivers in the crashdump kernel + *portions of the iova address ranges that are different + *from the iova address ranges that were being used by the old kernel + *at the time of the panic. + * + */ It looks like you are still copying the io-page-tables from the old kernel into the kdump kernel, is that right? With the approach that was proposed you only need to copy over the context entries 1-1. They are still pointing to the page-tables in the old kernels memory (which is just fine). The root-entry of the old kernel is also re-used, and when the kdump kernel starts to use a device, its context entry is updated to point to a newly allocated page-table. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Exynos IOMMU driver doesn't work?
On 9 January 2015 at 23:34, Javier Martinez Canillas jav...@dowhile0.org wrote: [adding Marek, Sjoerd and Joonyoung that were discussing about iommu support in another thread] Thank you Javier. Hello Hongbo, On Fri, Jan 9, 2015 at 8:31 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote: Add linux-samsung-...@vger.kernel.org mailing list. On 7 January 2015 at 18:31, Hongbo Zhang hongbo.zh...@linaro.org wrote: Hi Cho KyongHo, Joerg et al, I found the latest Exynos IOMMU driver doesn't work, the line 481: BUG_ON(!has_sysmmu(dev)); in function __exynos_sysmmu_enable() in file exynos-iommu.c triggers kernel panic. Then I found the dev-archdata.iommu isn't initialized at all, it should be the root cause. That's correct, I found the same the other day since and thought about posting a patch to return -ENODEV if !has_sysmmu(dev) instead to avoid the driver to panic the kernel. But then I realized this is already fixed in Marek's [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem series [0]. Another problem is this driver is added support of device tree, but there is no device tree nodes in the dts file, so I had to search from internet and added those nodes manually. I've found these links of v12 and v13 patches https://lkml.org/lkml/2014/4/27/171 https://lkml.org/lkml/2014/5/12/34 patch v13 was merged into mainline kernel, but as a part of v12, it isn't complete and doesn't work alone, eg dts nodes are missing. (I didn't research much dev-archdata.iommu initialization error is introduced by which patch, but it seems in very old codes there is no such problem) Yes, please take a look to Marek series [0]. Keep in mind that the series does not support all sysmmu revisions so IOMMU is not supported for some SoCs (e.g: Exynos5). Support for that is planned once that series land into mainline though [1]. May I ask why are you interested in IOMMU support on Exynos? I'm asking because the reason why I tried to enable IOMMU support (and hit the same issue) was to try using the Exynos DRM HDMI driver with IOMMU since I found that HDMI is working on the downstream Samsung kernel [2] that has IOMMU support, but is not working on mainline. Because I am testing vfio-platform patches, IOMMU is used in this case. http://www.spinics.net/lists/kvm-arm/msg12445.html And I am glad to find a working kernel as you pointed out, then I found these two commits in this tree may solve my problem: 841a7fe TEMP/TO POST: iommu: exynos: Add mmu-masters support bd7e4c7 TEMP/TO POST: ARM: dts: add System MMU nodes of Exynos SoCs At the end the HDMI problem seems to not be IOMMU related but something with the power domains and clocking but in case you are facing the same issue, you may be interested in that discussion [3]. Best regards, Javier [0]: http://www.spinics.net/lists/linux-samsung-soc/msg39168.html [1]: http://www.spinics.net/lists/linux-samsung-soc/msg39980.html [2]: g...@github.com:exynos-reference/kernel.git [3]: http://www.spinics.net/lists/linux-samsung-soc/msg40828.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND] dma-mapping: tidy up dma_parms default handling
On Fri, Jan 09, 2015 at 07:45:49PM +, Arnd Bergmann wrote: On Friday 09 January 2015 16:56:03 Robin Murphy wrote: This one's a bit tricky to find a home for - I think technically it's probably an IOMMU patch, but then the long-underlying problem doesn't seem to have blown up anything until arm64, and my motivation is to make bits of Juno work, which seems to nudge it towards arm64/arm-soc territory. Could anyone suggest which tree is most appropriate? I have a set of patches touching various dma-mapping.h related bits across architectures and in ARM in particular. Your patch fits into that series, and I guess we could either have it in my asm-generic tree or in Andrew Morton's mm tree. Possibly also arm-soc for practical reasons, although it really doesn't belong in there. I also have a couple of fixes for issues found by Laurent for tearing down the IOMMU dma ops, so you could include those too. I'll send them out this afternoon. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Exynos IOMMU driver doesn't work?
Hello Hongbo, On Mon, Jan 12, 2015 at 11:51 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote: On 9 January 2015 at 23:34, Javier Martinez Canillas Yes, please take a look to Marek series [0]. Keep in mind that the series does not support all sysmmu revisions so IOMMU is not supported for some SoCs (e.g: Exynos5). Support for that is planned once that series land into mainline though [1]. May I ask why are you interested in IOMMU support on Exynos? I'm asking because the reason why I tried to enable IOMMU support (and hit the same issue) was to try using the Exynos DRM HDMI driver with IOMMU since I found that HDMI is working on the downstream Samsung kernel [2] that has IOMMU support, but is not working on mainline. Because I am testing vfio-platform patches, IOMMU is used in this case. http://www.spinics.net/lists/kvm-arm/msg12445.html Ok, different use case then. And I am glad to find a working kernel as you pointed out, then I Glad that you found the information useful. found these two commits in this tree may solve my problem: 841a7fe TEMP/TO POST: iommu: exynos: Add mmu-masters support bd7e4c7 TEMP/TO POST: ARM: dts: add System MMU nodes of Exynos SoCs Yes, that's what we cherry-picked as well to test HDMI since enabling IOMMU has a side effect of turning on the right power domains and enabling the needed clocks. Best regards, Javier ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump
On Mon, Jan 12, 2015 at 04:22:08PM +0100, Joerg Roedel wrote: On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote: + +#ifdef CONFIG_CRASH_DUMP + +/* + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU + * + * Fixes the crashdump kernel to deal with an active iommu and legacy + * DMA from the (old) panicked kernel in a manner similar to how legacy + * DMA is handled when no hardware iommu was in use by the old kernel -- + * allow the legacy DMA to continue into its current buffers. + * + * In the crashdump kernel, this code: + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA). + * 2. Do not re-enable IOMMU's translating. + * 3. In kdump kernel, use the old root entry table. + * 4. Leaves the current translations in-place so that legacy DMA will + *continue to use its current buffers. + * 5. Allocates to the device drivers in the crashdump kernel + *portions of the iova address ranges that are different + *from the iova address ranges that were being used by the old kernel + *at the time of the panic. + * + */ It looks like you are still copying the io-page-tables from the old kernel into the kdump kernel, is that right? With the approach that was proposed you only need to copy over the context entries 1-1. They are still pointing to the page-tables in the old kernels memory (which is just fine). Kdump has the notion of backup region. Where certain parts of old kernels memory can be moved to a different location (first 640K on x86 as of now) and new kernel can make use of this memory now. So we will have to just make sure that no parts of this old page table fall into backup region. Thanks Vivek ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] PCI/x86: Interface for testing multivector MSI support
On Thu, 2015-01-08 at 09:15 -0700, Bjorn Helgaas wrote: On Fri, Nov 21, 2014 at 03:08:27PM -0700, Alex Williamson wrote: I'd like to make vfio-pci capable of manipulating the device exposed to the user such that if the host can only support a single MSI vector then we hide the fact that the device itself may actually be able to support more. When we virtualize PCI config space and interrupt setup there's no PCI protocol for the device failing to allocate the number of vectors that it said were available. If the userspace driver is a guest operating system, it certainly doesn't expect this to fail. I don't think we can ever guarantee that a multi-vector request will succeed, but we can certainly guarantee that it will fail if the platform doesn't support it. An example device is the Atheros AR93xxx running in a Windows 7 VM. Both the device and the guest OS support multiple MSI vectors. With interrupt remapping, such that the host supports multivector, the device works well in the guest. With interrupt remapping disabled, the device is far less reliable because of the mismatch in MSI programming vs driver configuration and often fails. If vfio-pci can test whether multiple vectors are supported, then we can make it work reliably in both cases by adjusting the exposed MSI capability, like in this patch that would follow this series: https://github.com/awilliam/linux-vfio/commit/9ace67515680 With this series, only x86 w/ interrupt remapping will advertise support for multiple MSI vectors. In surveying the code, I couldn't find any other archs that allowed it, but I'll take corrections if that's untrue. Thanks, Per Thomas' comments and your possible workaround if we don't have pci_msi_supported(), I'm going to ignore these for now. Let me know if you disagree. Yep, that's fine. I'll either forget about this for a while or kludge something in vfio to know that only x86 with interrupt remapping, which I can test from the IOMMU API, has multivector MSI support. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Track when amd_iommu_v2 init is complete
Hi Oded, On Mon, Dec 22, 2014 at 12:23:44PM +0200, Oded Gabbay wrote: The drm guys suggested we move iommu/ subsystem before gpu/ subsystem in drivers/Makefile instead of the above patch (and the complementing patch-set in amdkfd). I did that and it works, so please see this patch as discarded for now. I will send a new patch-set shortly. Yeah, this is still a hack, but a better solution than tracking the initialization order manually. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/4] Genericise the IOVA allocator
Hi Joerg, On 12/01/15 15:52, Joerg Roedel wrote: [...] Thanks for doing this, I like this patch-set. I would also appreciate if someone from Intel could have a look at it, David? Besides, can you please re-post this patch-set rebased to latest upstream with the better versions of patch 1 and 2, please? I consider to apply these changes then. Thanks! Funnily enough, that's on my things I didn't quite get round to yesterday list. I have this series rebased onto -rc3 with the comments addressed, and I'm in the middle of a final cleanup and check of the arm64 dma-mapping stuff on top of it. Expect to see both later today. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem
Hello Joonyoung, On 01/12/2015 07:40 AM, Joonyoung Shim wrote: And also making changes to the clocks in the clk-exynos5420 driver. Can you please explain the rationale for those changes? I'm asking because without your clock changes (only adding the DISP1 pd and making the devices as consumers), I've HDMI output too but video is even worse. This [0] is the minimal change I have on top of 3.19-rc3 to have some output. I just refer below patches, http://comments.gmane.org/gmane.linux.kernel.samsung-soc/34576 But i'm not sure whether DISP1 power domain is same case with MFC power domain. Thanks a lot for sharing those patches, now your changes are much more clear to me. So there seems to be two issues here, one is the mixer and hdmi modules not being attached to the DISP1 power domain and another one is the clocks setup not being correct to have proper HDMI video output. Hmm, i can see normal hdmi output still from latest upstream kernel(3.19-rc4) with my kernel changes and u-boot changes(DISP1 power domain disable) of prior mail on odroid xu3 board. I thought you said on another email that after commit 2ed127697eb1 which landed on 3.19-rc1 you had bad HDMI output? In your changes, it was missing the SW_ACLK_400_DISP1 and USER_ACLK_400_DISP1 clock mux outputs that goes to internal buses in the DISP1. Adding IDs for these in the exynos5420 clock driver and to the parent and input clock paris list in the DISP1 power domain gives me a good HDMI output on 3.19-rc2. Also, the SW_ACLK_300_DISP1 and USER_ACLK_300_DISP1 are needed for the FIMD parent and input clock respectively. Adding those to the clocks list of the DISP1 power domain gives me working display + HDMI on my Exynos5800 Peach Pi. These are the changes I have now [0]. Please let me know what you think. I didn't have this issue when testing your patch against 3.19-rc2. From your log I see that you are testing on a 3.18.1. So maybe makes sense to test with the latest kernel version since this HDMI issue qualifies as an 3.19-rc fix? Since commit 2ed127697eb1 (PM / Domains: Power on the PM domain right after attach completes) that landed in 3.19-rc1, I see that the power domain is powered on when a device is attached. So maybe that is what makes a difference here? I'm not sure, but i get same error results from 3.19-rc4. Did you test using exynos drm driver? I used modetest of libdrm Yes, I was not able to trigger that by running modetest but by turning off my HDMI monitor and then turning it on again. When the monitor is turned on then I see a Power domain power-domain disable failed and the imprecise external abort error. I had to disable CONFIG_DRM_EXYNOS_DP in order to trigger though and that is why I was not able to reproduce it before. I think though that this is a separate issue of the HDMI not working since power domains should be able to have many consumers devices and I see that other power domains are used that way. Best regards, Javier [0]: diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi index 0ac5e0810e97..53b0a03843f2 100644 --- a/arch/arm/boot/dts/exynos5420.dtsi +++ b/arch/arm/boot/dts/exynos5420.dtsi @@ -270,6 +270,19 @@ reg = 0x10044120 0x20; }; + disp1_pd: power-domain@100440C0 { + compatible = samsung,exynos4210-pd; + reg = 0x100440C0 0x20; + clocks = clock CLK_FIN_PLL, clock CLK_MOUT_SW_ACLK200, + clock CLK_MOUT_USER_ACLK200_DISP1, + clock CLK_MOUT_SW_ACLK300, + clock CLK_MOUT_USER_ACLK300_DISP1, + clock CLK_MOUT_SW_ACLK400, + clock CLK_MOUT_USER_ACLK400_DISP1; + clock-names = oscclk, pclk0, clk0, + pclk1, clk1, pclk2, clk2; + }; + pinctrl_0: pinctrl@1340 { compatible = samsung,exynos5420-pinctrl; reg = 0x1340 0x1000; @@ -537,6 +550,7 @@ fimd: fimd@1440 { clocks = clock CLK_SCLK_FIMD1, clock CLK_FIMD1; clock-names = sclk_fimd, fimd; + samsung,power-domain = disp1_pd; }; adc: adc@12D1 { @@ -710,6 +724,7 @@ phy = hdmiphy; samsung,syscon-phandle = pmu_system_controller; status = disabled; + samsung,power-domain = disp1_pd; }; hdmiphy: hdmiphy@145D { @@ -722,6 +737,7 @@ interrupts = 0 94 0; clocks = clock CLK_MIXER, clock CLK_SCLK_HDMI; clock-names = mixer, sclk_hdmi; + samsung,power-domain = disp1_pd; }; gsc_0: video-scaler@13e0 { diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c index 848d602efc06..07d666cc6a29 100644 --- a/drivers/clk/samsung/clk-exynos5420.c +++
Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump
On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote: On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote: Kdump has the notion of backup region. Where certain parts of old kernels memory can be moved to a different location (first 640K on x86 as of now) and new kernel can make use of this memory now. So we will have to just make sure that no parts of this old page table fall into backup region. Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough yet ;) Sadly, your above statement is true for all hardware-accessible data structures in IOMMU code. I think about how we can solve this, is there an easy way to allocate memory that is not in any backup region? Hmm..., there does not seem to be any easy way to do this. In fact, as of now, kernel does not even know where is backup region. All these details are managed by user space completely (except for new kexec_file_load() syscall). That means we are left with ugly options now. - Define per arch kexec backup regions in kernel and export it to user space and let kexec-tools make use of that deinition (instead of defining its own). That way memory allocation code in kernel can look at this backup area and skip it for certain allocations. Thanks Vivek ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 5/5] arm64: hook up IOMMU dma_ops
With iommu_dma_ops in place, hook them up to the configuration code, so IOMMU-fronted devices will get them automatically. Signed-off-by: Robin Murphy robin.mur...@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/dma-mapping.h | 10 +- arch/arm64/mm/dma-mapping.c | 22 ++ 3 files changed, 28 insertions(+), 5 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1f9a20..e2abcdc 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -66,6 +66,7 @@ config ARM64 select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE select HAVE_SYSCALL_TRACEPOINTS + select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index 82082c4..0791a78 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -45,13 +45,13 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev) return __generic_dma_ops(dev); } -static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, - struct iommu_ops *iommu, bool coherent) -{ - dev-archdata.dma_coherent = coherent; -} +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, + struct iommu_ops *iommu, bool coherent); #define arch_setup_dma_ops arch_setup_dma_ops +void arch_teardown_dma_ops(struct device *dev); +#define arch_teardown_dma_ops arch_teardown_dma_ops + /* do not use this function in a driver */ static inline bool is_device_dma_coherent(struct device *dev) { diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 8e449a7..d52175d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -729,10 +729,32 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, iommu_dma_release_mapping(mapping); } +static void __iommu_teardown_dma_ops(struct device *dev) +{ + if (dev-archdata.mapping) { + iommu_dma_detach_device(dev); + dev-archdata.dma_ops = NULL; + } +} + #else static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, struct iommu_ops *iommu) { } +static void __iommu_teardown_dma_ops(struct device *dev) { } + #endif /* CONFIG_IOMMU_DMA */ + +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, + struct iommu_ops *iommu, bool coherent) +{ + dev-archdata.dma_coherent = coherent; + __iommu_setup_dma_ops(dev, dma_base, size, iommu); +} + +void arch_teardown_dma_ops(struct device *dev) +{ + __iommu_teardown_dma_ops(dev); +} -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 3/5] iommu: implement common IOMMU ops for DMA mapping
Taking inspiration from the existing arch/arm code, break out some generic functions to interface the DMA-API to the IOMMU-API. This will do the bulk of the heavy lifting for IOMMU-backed dma-mapping. Whilst the target is arm64, rather than introduce yet another private implementation, place this in common code as the first step towards consolidating the numerous versions spread around between architecture code and IOMMU drivers. Signed-off-by: Robin Murphy robin.mur...@arm.com --- include/linux/dma-iommu.h | 78 lib/Kconfig | 8 + lib/Makefile | 1 + lib/dma-iommu.c | 455 ++ 4 files changed, 542 insertions(+) create mode 100644 include/linux/dma-iommu.h create mode 100644 lib/dma-iommu.c diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h new file mode 100644 index 000..4515407 --- /dev/null +++ b/include/linux/dma-iommu.h @@ -0,0 +1,78 @@ +/* + * Copyright (C) 2014 ARM Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef __DMA_IOMMU_H +#define __DMA_IOMMU_H + +#ifdef __KERNEL__ + +#include linux/types.h +#include linux/iommu.h + +#ifdef CONFIG_IOMMU_DMA + +int iommu_dma_init(void); + +struct iommu_dma_mapping *iommu_dma_create_mapping(struct iommu_ops *ops, + dma_addr_t base, size_t size); +void iommu_dma_release_mapping(struct iommu_dma_mapping *mapping); + +dma_addr_t iommu_dma_create_iova_mapping(struct device *dev, + struct page **pages, size_t size, bool coherent); +int iommu_dma_release_iova_mapping(struct device *dev, dma_addr_t iova, + size_t size); + +struct page **iommu_dma_alloc_buffer(struct device *dev, size_t size, + gfp_t gfp, struct dma_attrs *attrs, + void (clear_buffer)(struct page *page, size_t size)); +int iommu_dma_free_buffer(struct device *dev, struct page **pages, size_t size, + struct dma_attrs *attrs); + +dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs); +dma_addr_t iommu_dma_coherent_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs); +void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs); + +int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs); +int iommu_dma_coherent_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir, + struct dma_attrs *attrs); +void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sgl, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs); + +int iommu_dma_attach_device(struct device *dev, struct iommu_dma_mapping *mapping); +void iommu_dma_detach_device(struct device *dev); + +int iommu_dma_supported(struct device *hwdev, u64 mask); +int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); + +phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr); + +#else + +static inline phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr) +{ + return 0; +} + +#endif /* CONFIG_IOMMU_DMA */ + +#endif /* __KERNEL__ */ +#endif /* __DMA_IOMMU_H */ diff --git a/lib/Kconfig b/lib/Kconfig index 54cf309..965d027 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -518,4 +518,12 @@ source lib/fonts/Kconfig config ARCH_HAS_SG_CHAIN def_bool n +# +# IOMMU-agnostic DMA-mapping layer +# +config IOMMU_DMA + def_bool n + depends on IOMMU_SUPPORT ARCH_HAS_SG_CHAIN NEED_SG_DMA_LENGTH + select IOMMU_IOVA + endmenu diff --git a/lib/Makefile b/lib/Makefile index 3c3b30b..e4b6134 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -103,6 +103,7 @@ obj-$(CONFIG_AUDIT_COMPAT_GENERIC) += compat_audit.o obj-$(CONFIG_SWIOTLB) += swiotlb.o obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o +obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o obj-$(CONFIG_NOTIFIER_ERROR_INJECTION) += notifier-error-inject.o obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o diff --git a/lib/dma-iommu.c
[RFC PATCH 1/5] arm64: Combine coherent and non-coherent swiotlb dma_ops
From: Catalin Marinas catalin.mari...@arm.com Since dev_archdata now has a dma_coherent state, combine the two coherent and non-coherent operations and remove their declaration, together with set_dma_ops, from the arch dma-mapping.h file. Signed-off-by: Catalin Marinas catalin.mari...@arm.com --- arch/arm64/include/asm/dma-mapping.h | 11 +--- arch/arm64/mm/dma-mapping.c | 116 --- 2 files changed, 54 insertions(+), 73 deletions(-) diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index 9ce3e68..6932bb5 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -28,8 +28,6 @@ #define DMA_ERROR_CODE (~(dma_addr_t)0) extern struct dma_map_ops *dma_ops; -extern struct dma_map_ops coherent_swiotlb_dma_ops; -extern struct dma_map_ops noncoherent_swiotlb_dma_ops; static inline struct dma_map_ops *__generic_dma_ops(struct device *dev) { @@ -47,23 +45,18 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev) return __generic_dma_ops(dev); } -static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops) -{ - dev-archdata.dma_ops = ops; -} - static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, struct iommu_ops *iommu, bool coherent) { dev-archdata.dma_coherent = coherent; - if (coherent) - set_dma_ops(dev, coherent_swiotlb_dma_ops); } #define arch_setup_dma_ops arch_setup_dma_ops /* do not use this function in a driver */ static inline bool is_device_dma_coherent(struct device *dev) { + if (!dev) + return false; return dev-archdata.dma_coherent; } diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index d920942..0a24b9b 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -134,16 +134,17 @@ static void __dma_free_coherent(struct device *dev, size_t size, swiotlb_free_coherent(dev, size, vaddr, dma_handle); } -static void *__dma_alloc_noncoherent(struct device *dev, size_t size, -dma_addr_t *dma_handle, gfp_t flags, -struct dma_attrs *attrs) +static void *__dma_alloc(struct device *dev, size_t size, +dma_addr_t *dma_handle, gfp_t flags, +struct dma_attrs *attrs) { struct page *page; void *ptr, *coherent_ptr; + bool coherent = is_device_dma_coherent(dev); size = PAGE_ALIGN(size); - if (!(flags __GFP_WAIT)) { + if (!coherent !(flags __GFP_WAIT)) { struct page *page = NULL; void *addr = __alloc_from_pool(size, page); @@ -151,13 +152,16 @@ static void *__dma_alloc_noncoherent(struct device *dev, size_t size, *dma_handle = phys_to_dma(dev, page_to_phys(page)); return addr; - } ptr = __dma_alloc_coherent(dev, size, dma_handle, flags, attrs); if (!ptr) goto no_mem; + /* no need for non-cacheable mapping if coherent */ + if (coherent) + return ptr; + /* remove any dirty cache lines on the kernel alias */ __dma_flush_range(ptr, ptr + size); @@ -179,15 +183,17 @@ no_mem: return NULL; } -static void __dma_free_noncoherent(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle, - struct dma_attrs *attrs) +static void __dma_free(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_handle, + struct dma_attrs *attrs) { void *swiotlb_addr = phys_to_virt(dma_to_phys(dev, dma_handle)); - if (__free_from_pool(vaddr, size)) - return; - vunmap(vaddr); + if (!is_device_dma_coherent(dev)) { + if (__free_from_pool(vaddr, size)) + return; + vunmap(vaddr); + } __dma_free_coherent(dev, size, swiotlb_addr, dma_handle, attrs); } @@ -199,7 +205,8 @@ static dma_addr_t __swiotlb_map_page(struct device *dev, struct page *page, dma_addr_t dev_addr; dev_addr = swiotlb_map_page(dev, page, offset, size, dir, attrs); - __dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir); + if (!is_device_dma_coherent(dev)) + __dma_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir); return dev_addr; } @@ -209,7 +216,8 @@ static void __swiotlb_unmap_page(struct device *dev, dma_addr_t dev_addr, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs) { - __dma_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir); + if
[RFC PATCH 2/5] arm64: implement generic IOMMU configuration
Add the necessary call to of_iommu_init. Signed-off-by: Robin Murphy robin.mur...@arm.com --- arch/arm64/kernel/setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index b809911..8304141 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -40,6 +40,7 @@ #include linux/fs.h #include linux/proc_fs.h #include linux/memblock.h +#include linux/of_iommu.h #include linux/of_fdt.h #include linux/of_platform.h #include linux/efi.h @@ -424,6 +425,7 @@ void __init setup_arch(char **cmdline_p) static int __init arm64_device_init(void) { + of_iommu_init(); of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); return 0; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 0/5] arm64: IOMMU-backed DMA mapping
Hi all, Whilst it's a long way off perfect, this has reached the point of being functional and stable enough to be useful, so here it is. The core consists of the meat of the arch/arm implementation modified to remove the assumption of PAGE_SIZE pages and ported over to the Intel IOVA allocator instead of the bitmap-based one. For that, this series depends on my Genericise the IOVA allocator series posted earlier[1]. There are plenty of obvious things still to do, including: * Domain and group handling is all wrong, but that's a bigger problem. For the moment it does more or less the same thing as the arch/arm code, which at least works for the one-IOMMU-per-device situation. * IOMMU domains and IOVA domains probably want to be better integrated with devices and each other, rather than having a proliferation of arch-specific structs. * The temporary map_sg implementation - I have a 'proper' iommu_map_sg based one in progress, but since the simple one works it's not been as high a priority. * Port arch/arm over to it. I'd guess it might be preferable to merge this through arm64 first, though, rather than overcomplicate matters. * There may well be scope for streamlining and tidying up the copied parts - In general I've simply avoided touching anything I don't fully understand. * In the same vein, I'm sure lots of it is fairly ARM-specific, so will need longer-term work to become truly generic. [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/8208 Catalin Marinas (1): arm64: Combine coherent and non-coherent swiotlb dma_ops Robin Murphy (4): arm64: implement generic IOMMU configuration iommu: implement common IOMMU ops for DMA mapping arm64: add IOMMU dma_ops arm64: hook up IOMMU dma_ops arch/arm64/Kconfig | 1 + arch/arm64/include/asm/device.h | 3 + arch/arm64/include/asm/dma-mapping.h | 33 +-- arch/arm64/kernel/setup.c| 2 + arch/arm64/mm/dma-mapping.c | 435 - include/linux/dma-iommu.h| 78 ++ lib/Kconfig | 8 + lib/Makefile | 1 + lib/dma-iommu.c | 455 +++ 9 files changed, 938 insertions(+), 78 deletions(-) create mode 100644 include/linux/dma-iommu.h create mode 100644 lib/dma-iommu.c -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 4/5] arm64: add IOMMU dma_ops
Taking some inspiration from the arch/arm code, implement the arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer. Signed-off-by: Robin Murphy robin.mur...@arm.com --- arch/arm64/include/asm/device.h | 3 + arch/arm64/include/asm/dma-mapping.h | 12 ++ arch/arm64/mm/dma-mapping.c | 297 +++ 3 files changed, 312 insertions(+) diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 243ef25..c17f100 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,9 @@ struct dev_archdata { struct dma_map_ops *dma_ops; #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ +#ifdef CONFIG_IOMMU_DMA + struct iommu_dma_mapping *mapping; +#endif #endif bool dma_coherent; }; diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index 6932bb5..82082c4 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -64,11 +64,23 @@ static inline bool is_device_dma_coherent(struct device *dev) static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr) { +#ifdef CONFIG_IOMMU_DMA + /* We don't have an easy way of dealing with this... */ + BUG_ON(dev-archdata.mapping); +#endif return (dma_addr_t)paddr; } +#ifdef CONFIG_IOMMU_DMA +phys_addr_t iova_to_phys(struct device *dev, dma_addr_t dev_addr); +#endif + static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr) { +#ifdef CONFIG_IOMMU_DMA + if (dev-archdata.mapping) + return iova_to_phys(dev, dev_addr); +#endif return (phys_addr_t)dev_addr; } diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 0a24b9b..8e449a7 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -23,6 +23,7 @@ #include linux/genalloc.h #include linux/dma-mapping.h #include linux/dma-contiguous.h +#include linux/dma-iommu.h #include linux/vmalloc.h #include linux/swiotlb.h @@ -426,6 +427,9 @@ static int __init arm64_dma_init(void) ret |= swiotlb_late_init(); ret |= atomic_pool_init(); +#ifdef CONFIG_IOMMU_DMA + ret |= iommu_dma_init(); +#endif return ret; } @@ -439,3 +443,296 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init); + + +#ifdef CONFIG_IOMMU_DMA + +static struct page **__atomic_get_pages(void *addr) +{ + struct page *page; + phys_addr_t phys; + + phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr); + page = phys_to_page(phys); + + return (struct page **)page; +} + +static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs) +{ + struct vm_struct *area; + + if (__in_atomic_pool(cpu_addr, PAGE_SIZE)) + return __atomic_get_pages(cpu_addr); + + area = find_vm_area(cpu_addr); + if (!area) + return NULL; + + return area-pages; +} + +static void *__iommu_alloc_atomic(struct device *dev, size_t size, + dma_addr_t *handle, bool coherent) +{ + struct page *page; + void *addr; + + addr = __alloc_from_pool(size, page); + if (!addr) + return NULL; + + *handle = iommu_dma_create_iova_mapping(dev, page, size, coherent); + if (*handle == DMA_ERROR_CODE) { + __free_from_pool(addr, size); + return NULL; + } + return addr; +} + +static void __iommu_free_atomic(struct device *dev, void *cpu_addr, + dma_addr_t handle, size_t size) +{ + iommu_dma_release_iova_mapping(dev, handle, size); + __free_from_pool(cpu_addr, size); +} + +static void __dma_clear_buffer(struct page *page, size_t size) +{ + void *ptr = page_address(page); + + memset(ptr, 0, size); + __dma_flush_range(ptr, ptr + size); +} + +static void *__iommu_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) +{ + bool coherent = is_device_dma_coherent(dev); + pgprot_t prot = coherent ? __pgprot(PROT_NORMAL) : + __pgprot(PROT_NORMAL_NC); + struct page **pages; + void *addr = NULL; + + *handle = DMA_ERROR_CODE; + size = PAGE_ALIGN(size); + + if (!(gfp __GFP_WAIT)) + return __iommu_alloc_atomic(dev, size, handle, coherent); + /* +* Following is a work-around (a.k.a. hack) to prevent pages +* with __GFP_COMP being passed to split_page() which cannot +* handle them. The real problem is that this flag probably +* should be 0 on ARM as it is not supported on this +* platform; see CONFIG_HUGETLBFS. +*/ + gfp = ~(__GFP_COMP); + + pages = iommu_dma_alloc_buffer(dev, size,
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Friday, January 09, 2015 10:56 PM To: Radim Krčmář; Wu, Feng Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com; eric.au...@linaro.org; linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org; k...@vger.kernel.org Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI On 09/01/2015 15:54, Radim Krčmář wrote: There are two points relevant to this patch in new KVM's implementation, (KVM: x86: amend APIC lowest priority arbitration, https://lkml.org/lkml/2015/1/9/362) 1) lowest priority depends on TPR 2) there is no need for balancing (1) has to be considered with PI as well. The chipset doesn't support it. :( I kept (2) to avoid whining from people building on that behaviour, but lowest priority backed by PI could be transparent without it. Patch below removes the balancing, but I am not sure this is a price we allowed ourselves to pay ... what are your opinions? I wouldn't mind, but it requires a lot of benchmarking. In fact, the real hardware may do lowest priority in round robin way, the new hardware even doesn't consider the TPR for lowest priority interrupts delivery. As discussed with Paolo before, I will submit a patch to support lowest priority for PI after this series is merged. Thanks, Feng Paolo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 01/10] iommu/vt-d: Update iommu_attach_domain() and its callers
On 01/12/2015 11:18 PM, Joerg Roedel wrote: On Mon, Jan 12, 2015 at 03:06:19PM +0800, Li, Zhen-Hua wrote: Allow specification of the domain-id for the new domain. This patch only adds the 'did' parameter to iommu_attach_domain() and modifies all of its callers to specify the default value of -1 which says no did specified, allocate a new one. I think its better to keep the old iommu_attach_domain() interface in place and introduce a new function (like iommu_attach_domain_with_id() or something) which has the additional parameter. Then you can rewrite iommu_attach_domain(): iommu_attach_domai(...) { return iommu_attach_domain_with_id(..., -1); } This way you don't have to update all the callers of iommu_attach_domain() and the interface is more readable. Joerg That's a good way. I will do this in next version. Thanks Zhenhua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 00/19] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem
Hi, On 01/13/2015 01:09 AM, Javier Martinez Canillas wrote: Hello Joonyoung, On 01/12/2015 07:40 AM, Joonyoung Shim wrote: And also making changes to the clocks in the clk-exynos5420 driver. Can you please explain the rationale for those changes? I'm asking because without your clock changes (only adding the DISP1 pd and making the devices as consumers), I've HDMI output too but video is even worse. This [0] is the minimal change I have on top of 3.19-rc3 to have some output. I just refer below patches, http://comments.gmane.org/gmane.linux.kernel.samsung-soc/34576 But i'm not sure whether DISP1 power domain is same case with MFC power domain. Thanks a lot for sharing those patches, now your changes are much more clear to me. So there seems to be two issues here, one is the mixer and hdmi modules not being attached to the DISP1 power domain and another one is the clocks setup not being correct to have proper HDMI video output. Hmm, i can see normal hdmi output still from latest upstream kernel(3.19-rc4) with my kernel changes and u-boot changes(DISP1 power domain disable) of prior mail on odroid xu3 board. I thought you said on another email that after commit 2ed127697eb1 which landed on 3.19-rc1 you had bad HDMI output? In your changes, it was missing the SW_ACLK_400_DISP1 and USER_ACLK_400_DISP1 clock mux outputs that goes to internal buses in the DISP1. Adding IDs for these in the exynos5420 clock driver and to the parent and input clock paris list in the DISP1 power domain gives me a good HDMI output on 3.19-rc2. Also, the SW_ACLK_300_DISP1 and USER_ACLK_300_DISP1 are needed for the FIMD parent and input clock respectively. Adding those to the clocks list of the DISP1 power domain gives me working display + HDMI on my Exynos5800 Peach Pi. These are the changes I have now [0]. Please let me know what you think. Good, it's working with your patch without u-boot changes and reverting of commit 2ed127697eb. I didn't have this issue when testing your patch against 3.19-rc2. From your log I see that you are testing on a 3.18.1. So maybe makes sense to test with the latest kernel version since this HDMI issue qualifies as an 3.19-rc fix? Since commit 2ed127697eb1 (PM / Domains: Power on the PM domain right after attach completes) that landed in 3.19-rc1, I see that the power domain is powered on when a device is attached. So maybe that is what makes a difference here? I'm not sure, but i get same error results from 3.19-rc4. Did you test using exynos drm driver? I used modetest of libdrm Yes, I was not able to trigger that by running modetest but by turning off my HDMI monitor and then turning it on again. When the monitor is turned on then I see a Power domain power-domain disable failed and the imprecise external abort error. I had to disable CONFIG_DRM_EXYNOS_DP in order to trigger though and that is why I was not able to reproduce it before. I think though that this is a separate issue of the HDMI not working since power domains should be able to have many consumers devices and I see that other power domains are used that way. OK, we need more investigation. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump
On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote: Kdump has the notion of backup region. Where certain parts of old kernels memory can be moved to a different location (first 640K on x86 as of now) and new kernel can make use of this memory now. So we will have to just make sure that no parts of this old page table fall into backup region. Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough yet ;) Sadly, your above statement is true for all hardware-accessible data structures in IOMMU code. I think about how we can solve this, is there an easy way to allocate memory that is not in any backup region? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/4] Genericise the IOVA allocator
Hi Robin, On Tue, Nov 25, 2014 at 05:27:24PM +, Robin Murphy wrote: Hi all, I've been implementing IOMMU DMA mapping for arm64, based on tidied-up parts of the existing arch/arm/mm/dma-mapping.c with a clear divide between the arch-specific parts and the general DMA-API to IOMMU-API layer so that that can be shared; similar to what Ritesh started before and was unable to complete[1], but working in the other direction. The first part of that tidy-up involved ripping out the homebrewed IOVA allocator and plumbing in iova.c, necessitating the changes presented here. The rest is currently sat under arch/arm64 for the sake of getting it working quickly with minimal impact - ideally I'd move it out and port arch/arm before merging, but I don't know quite how impatient people are. Regardless of that decision, this bit stands alone, so here it is. Feel free to ignore patches 1 and 2, since I see Sakari has recently posted a more thorough series for that[2], that frankly looks nicer ;) I've merely left them in as context here. [1]:http://thread.gmane.org/gmane.linux.ports.arm.kernel/331299 [2]:http://article.gmane.org/gmane.linux.kernel.iommu/7436 Robin Murphy (4): iommu: build iova.c for any IOMMU iommu: consolidate IOVA allocator code iommu: make IOVA domain low limit flexible iommu: make IOVA domain page size explicit Thanks for doing this, I like this patch-set. I would also appreciate if someone from Intel could have a look at it, David? Besides, can you please re-post this patch-set rebased to latest upstream with the better versions of patch 1 and 2, please? I consider to apply these changes then. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/4] iommu: allow building iova.c independently
In preparation for sharing the IOVA allocator, split it out under its own Kconfig symbol. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/Kconfig | 4 drivers/iommu/Makefile | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 325188e..a839ca9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -13,6 +13,9 @@ menuconfig IOMMU_SUPPORT if IOMMU_SUPPORT +config IOMMU_IOVA + bool + config OF_IOMMU def_bool y depends on OF IOMMU_API @@ -91,6 +94,7 @@ config INTEL_IOMMU bool Support for Intel IOMMU using DMA Remapping Devices depends on PCI_MSI ACPI (X86 || IA64_GENERIC) select IOMMU_API + select IOMMU_IOVA select DMAR_TABLE help DMA remapping (DMAR) devices support enables independent address diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 7b976f2..0b1b94e 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,13 +1,14 @@ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o +obj-$(CONFIG_IOMMU_IOVA) += iova.o obj-$(CONFIG_OF_IOMMU) += of_iommu.o obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm-smmu.o obj-$(CONFIG_DMAR_TABLE) += dmar.o -obj-$(CONFIG_INTEL_IOMMU) += iova.o intel-iommu.o +obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/4] Genericise the IOVA allocator
Hi all, Here's an update of my previous RFC[1] in preparation for hooking the IOVA allocator up to the arm64 DMA mapping API, rebased onto 3.19-rc3. I tried rebasing patches 3 and 4 onto Sakari's RFC series[2] (the merge conflict is pretty trivial), however I found that series applied to rc3 causes a build error in intel-iommu.c. Thus for now I've left in my simpler patches 1 and 2 for breaking out the library. Hopefully we can reach some consensus on that. Tested on arm64 (DMA mapping series coming soon), and compile-tested for x86_64_defconfig. Changes since RFC: Patch 1: Use a proper Kconfig symbol rather than a hack Patch 4: sanity check for powers of two also, and clarify the comment [1]:http://thread.gmane.org/gmane.linux.kernel.iommu/7480 [2]:http://thread.gmane.org/gmane.linux.kernel.iommu/7436 Robin Murphy (4): iommu: allow building iova.c independently iommu: consolidate IOVA allocator code iommu: make IOVA domain low limit flexible iommu: make IOVA domain page size explicit drivers/iommu/Kconfig | 4 drivers/iommu/Makefile | 3 ++- drivers/iommu/intel-iommu.c | 45 ++ drivers/iommu/iova.c| 53 + include/linux/iova.h| 41 +++ 5 files changed, 103 insertions(+), 43 deletions(-) -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/4] iommu: make IOVA domain low limit flexible
To share the IOVA allocator with other architectures, it needs to accommodate more general aperture restrictions; move the lower limit from a compile-time constant to a runtime domain property to allow IOVA domains with different requirements to co-exist. Also reword the slightly unclear description of alloc_iova since we're touching it anyway. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/intel-iommu.c | 9 ++--- drivers/iommu/iova.c| 10 ++ include/linux/iova.h| 7 +++ 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 5699653..275d056 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -71,6 +71,9 @@ __DOMAIN_MAX_PFN(gaw), (unsigned long)-1)) #define DOMAIN_MAX_ADDR(gaw) (((uint64_t)__DOMAIN_MAX_PFN(gaw)) VTD_PAGE_SHIFT) +/* IO virtual address start page frame number */ +#define IOVA_START_PFN (1) + #define IOVA_PFN(addr) ((addr) PAGE_SHIFT) #define DMA_32BIT_PFN IOVA_PFN(DMA_BIT_MASK(32)) #define DMA_64BIT_PFN IOVA_PFN(DMA_BIT_MASK(64)) @@ -1632,7 +1635,7 @@ static int dmar_init_reserved_ranges(void) struct iova *iova; int i; - init_iova_domain(reserved_iova_list, DMA_32BIT_PFN); + init_iova_domain(reserved_iova_list, IOVA_START_PFN, DMA_32BIT_PFN); lockdep_set_class(reserved_iova_list.iova_rbtree_lock, reserved_rbtree_key); @@ -1690,7 +1693,7 @@ static int domain_init(struct dmar_domain *domain, int guest_width) int adjust_width, agaw; unsigned long sagaw; - init_iova_domain(domain-iovad, DMA_32BIT_PFN); + init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN); domain_reserve_special_ranges(domain); /* calculate AGAW */ @@ -4321,7 +4324,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) { int adjust_width; - init_iova_domain(domain-iovad, DMA_32BIT_PFN); + init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN); domain_reserve_special_ranges(domain); /* calculate AGAW */ diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 520b8c8..a3dbba8 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -55,11 +55,13 @@ void free_iova_mem(struct iova *iova) } void -init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit) +init_iova_domain(struct iova_domain *iovad, unsigned long start_pfn, + unsigned long pfn_32bit) { spin_lock_init(iovad-iova_rbtree_lock); iovad-rbroot = RB_ROOT; iovad-cached32_node = NULL; + iovad-start_pfn = start_pfn; iovad-dma_32bit_pfn = pfn_32bit; } @@ -162,7 +164,7 @@ move_left: if (!curr) { if (size_aligned) pad_size = iova_get_pad_size(size, limit_pfn); - if ((IOVA_START_PFN + size + pad_size) limit_pfn) { + if ((iovad-start_pfn + size + pad_size) limit_pfn) { spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); return -ENOMEM; } @@ -237,8 +239,8 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova) * @size: - size of page frames to allocate * @limit_pfn: - max limit address * @size_aligned: - set if size_aligned address range is required - * This function allocates an iova in the range limit_pfn to IOVA_START_PFN - * looking from limit_pfn instead from IOVA_START_PFN. If the size_aligned + * This function allocates an iova in the range iovad-start_pfn to limit_pfn, + * searching top-down from limit_pfn to iovad-start_pfn. If the size_aligned * flag is set then the allocated address iova-pfn_lo will be naturally * aligned on roundup_power_of_two(size). */ diff --git a/include/linux/iova.h b/include/linux/iova.h index ad0507c..591b196 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -16,9 +16,6 @@ #include linux/rbtree.h #include linux/dma-mapping.h -/* IO virtual address start page frame number */ -#define IOVA_START_PFN (1) - /* iova structure */ struct iova { struct rb_node node; @@ -31,6 +28,7 @@ struct iova_domain { spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree */ struct rb_root rbroot; /* iova domain rbtree root */ struct rb_node *cached32_node; /* Save last alloced node */ + unsigned long start_pfn; /* Lower limit for this domain */ unsigned long dma_32bit_pfn; }; @@ -52,7 +50,8 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to); -void init_iova_domain(struct iova_domain *iovad, unsigned long
[PATCH 4/4] iommu: make IOVA domain page size explicit
Systems may contain heterogeneous IOMMUs supporting differing minimum page sizes, which may also not be common with the CPU page size. Thus it is practical to have an explicit notion of IOVA granularity to simplify handling of mapping and allocation constraints. As an initial step, move the IOVA page granularity from an implicit compile-time constant to a per-domain property so we can make use of it in IOVA domain context at runtime. To keep the abstraction tidy, extend the little API of inline iova_* helpers to parallel some of the equivalent PAGE_* macros. Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/intel-iommu.c | 9 ++--- drivers/iommu/iova.c| 12 ++-- include/linux/iova.h| 35 +-- 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 275d056..a0f5817 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1635,7 +1635,8 @@ static int dmar_init_reserved_ranges(void) struct iova *iova; int i; - init_iova_domain(reserved_iova_list, IOVA_START_PFN, DMA_32BIT_PFN); + init_iova_domain(reserved_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN, + DMA_32BIT_PFN); lockdep_set_class(reserved_iova_list.iova_rbtree_lock, reserved_rbtree_key); @@ -1693,7 +1694,8 @@ static int domain_init(struct dmar_domain *domain, int guest_width) int adjust_width, agaw; unsigned long sagaw; - init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN); + init_iova_domain(domain-iovad, VTD_PAGE_SIZE, IOVA_START_PFN, + DMA_32BIT_PFN); domain_reserve_special_ranges(domain); /* calculate AGAW */ @@ -4324,7 +4326,8 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) { int adjust_width; - init_iova_domain(domain-iovad, IOVA_START_PFN, DMA_32BIT_PFN); + init_iova_domain(domain-iovad, VTD_PAGE_SIZE, IOVA_START_PFN, + DMA_32BIT_PFN); domain_reserve_special_ranges(domain); /* calculate AGAW */ diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index a3dbba8..9dd8208 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -55,12 +55,20 @@ void free_iova_mem(struct iova *iova) } void -init_iova_domain(struct iova_domain *iovad, unsigned long start_pfn, - unsigned long pfn_32bit) +init_iova_domain(struct iova_domain *iovad, unsigned long granule, + unsigned long start_pfn, unsigned long pfn_32bit) { + /* +* IOVA granularity will normally be equal to the smallest +* supported IOMMU page size; both *must* be capable of +* representing individual CPU pages exactly. +*/ + BUG_ON((granule PAGE_SIZE) || !is_power_of_2(granule)); + spin_lock_init(iovad-iova_rbtree_lock); iovad-rbroot = RB_ROOT; iovad-cached32_node = NULL; + iovad-granule = granule; iovad-start_pfn = start_pfn; iovad-dma_32bit_pfn = pfn_32bit; } diff --git a/include/linux/iova.h b/include/linux/iova.h index 591b196..3920a19 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -28,6 +28,7 @@ struct iova_domain { spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree */ struct rb_root rbroot; /* iova domain rbtree root */ struct rb_node *cached32_node; /* Save last alloced node */ + unsigned long granule;/* pfn granularity for this domain */ unsigned long start_pfn; /* Lower limit for this domain */ unsigned long dma_32bit_pfn; }; @@ -37,6 +38,36 @@ static inline unsigned long iova_size(struct iova *iova) return iova-pfn_hi - iova-pfn_lo + 1; } +static inline unsigned long iova_shift(struct iova_domain *iovad) +{ + return __ffs(iovad-granule); +} + +static inline unsigned long iova_mask(struct iova_domain *iovad) +{ + return iovad-granule - 1; +} + +static inline size_t iova_offset(struct iova_domain *iovad, dma_addr_t iova) +{ + return iova iova_mask(iovad); +} + +static inline size_t iova_align(struct iova_domain *iovad, size_t size) +{ + return ALIGN(size, iovad-granule); +} + +static inline dma_addr_t iova_dma_addr(struct iova_domain *iovad, struct iova *iova) +{ + return (dma_addr_t)iova-pfn_lo iova_shift(iovad); +} + +static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t iova) +{ + return iova iova_shift(iovad); +} + int iommu_iova_cache_init(void); void iommu_iova_cache_destroy(void); @@ -50,8 +81,8 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to); -void
[PATCH 2/4] iommu: consolidate IOVA allocator code
In order to share the IOVA allocator with other architectures, break the unnecssary dependency on the Intel IOMMU driver and move the remaining IOVA internals to iova.c Signed-off-by: Robin Murphy robin.mur...@arm.com --- drivers/iommu/intel-iommu.c | 33 ++--- drivers/iommu/iova.c| 35 +++ include/linux/iova.h| 3 +++ 3 files changed, 40 insertions(+), 31 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 1232336..5699653 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -485,7 +485,6 @@ __setup(intel_iommu=, intel_iommu_setup); static struct kmem_cache *iommu_domain_cache; static struct kmem_cache *iommu_devinfo_cache; -static struct kmem_cache *iommu_iova_cache; static inline void *alloc_pgtable_page(int node) { @@ -523,16 +522,6 @@ static inline void free_devinfo_mem(void *vaddr) kmem_cache_free(iommu_devinfo_cache, vaddr); } -struct iova *alloc_iova_mem(void) -{ - return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC); -} - -void free_iova_mem(struct iova *iova) -{ - kmem_cache_free(iommu_iova_cache, iova); -} - static inline int domain_type_is_vm(struct dmar_domain *domain) { return domain-flags DOMAIN_FLAG_VIRTUAL_MACHINE; @@ -3427,23 +3416,6 @@ static inline int iommu_devinfo_cache_init(void) return ret; } -static inline int iommu_iova_cache_init(void) -{ - int ret = 0; - - iommu_iova_cache = kmem_cache_create(iommu_iova, -sizeof(struct iova), -0, -SLAB_HWCACHE_ALIGN, -NULL); - if (!iommu_iova_cache) { - printk(KERN_ERR Couldn't create iova cache\n); - ret = -ENOMEM; - } - - return ret; -} - static int __init iommu_init_mempool(void) { int ret; @@ -3461,7 +3433,7 @@ static int __init iommu_init_mempool(void) kmem_cache_destroy(iommu_domain_cache); domain_error: - kmem_cache_destroy(iommu_iova_cache); + iommu_iova_cache_destroy(); return -ENOMEM; } @@ -3470,8 +3442,7 @@ static void __init iommu_exit_mempool(void) { kmem_cache_destroy(iommu_devinfo_cache); kmem_cache_destroy(iommu_domain_cache); - kmem_cache_destroy(iommu_iova_cache); - + iommu_iova_cache_destroy(); } static void quirk_ioat_snb_local_iommu(struct pci_dev *pdev) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index f6b17e6..520b8c8 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -18,6 +18,41 @@ */ #include linux/iova.h +#include linux/slab.h + +static struct kmem_cache *iommu_iova_cache; + +int iommu_iova_cache_init(void) +{ + int ret = 0; + + iommu_iova_cache = kmem_cache_create(iommu_iova, +sizeof(struct iova), +0, +SLAB_HWCACHE_ALIGN, +NULL); + if (!iommu_iova_cache) { + pr_err(Couldn't create iova cache\n); + ret = -ENOMEM; + } + + return ret; +} + +void iommu_iova_cache_destroy(void) +{ + kmem_cache_destroy(iommu_iova_cache); +} + +struct iova *alloc_iova_mem(void) +{ + return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC); +} + +void free_iova_mem(struct iova *iova) +{ + kmem_cache_free(iommu_iova_cache, iova); +} void init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit) diff --git a/include/linux/iova.h b/include/linux/iova.h index 19e81d5..ad0507c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -39,6 +39,9 @@ static inline unsigned long iova_size(struct iova *iova) return iova-pfn_hi - iova-pfn_lo + 1; } +int iommu_iova_cache_init(void); +void iommu_iova_cache_destroy(void); + struct iova *alloc_iova_mem(void); void free_iova_mem(struct iova *iova); void free_iova(struct iova_domain *iovad, unsigned long pfn); -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump
On Mon, Jan 12, 2015 at 11:15:38AM -0500, Vivek Goyal wrote: On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote: On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote: Kdump has the notion of backup region. Where certain parts of old kernels memory can be moved to a different location (first 640K on x86 as of now) and new kernel can make use of this memory now. So we will have to just make sure that no parts of this old page table fall into backup region. Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough yet ;) Sadly, your above statement is true for all hardware-accessible data structures in IOMMU code. I think about how we can solve this, is there an easy way to allocate memory that is not in any backup region? Hmm..., there does not seem to be any easy way to do this. In fact, as of now, kernel does not even know where is backup region. All these details are managed by user space completely (except for new kexec_file_load() syscall). That means we are left with ugly options now. - Define per arch kexec backup regions in kernel and export it to user space and let kexec-tools make use of that deinition (instead of defining its own). That way memory allocation code in kernel can look at this backup area and skip it for certain allocations. Yes, that makes sense. In fact, I think all allocations for DMA memory need to take this into account to avoid potentially serious data corruption. If any memory for a disk superblock gets allocated in backup memory and a kdump happens, the new kernel might zero out that area and the disk controler then writes the zeroes to disk instead of the superblock. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu