Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: Test passed on my local workstation. Thanks, Zhenhua. dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re
Seems the subject was truncated? Maybe re means root entry? Then please fix it On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Add functions to load root entry table from old kernel, and to save updated root entry table. Add two member in struct intel_iommu, to store the RTA in old kernel, and the mapped virt address of it. Please explain a bit what is RTA in patch log, it is unclear to most of people who do not know iommu details. We use the old RTA in dump kernel, and when the iommu-root_entry is used as a cache in kdump kernel, its phys address will not be save to RTA register, but when its data is changed, we will save the new data to old root entry table. Li, Zhen-hua: The functions and logics. Takao Indoh: Add __iommu_flush_cache. Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com --- drivers/iommu/intel-iommu.c | 54 - include/linux/intel-iommu.h | 3 +++ 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 0b97c15..3a5d446 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -371,6 +371,10 @@ static struct context_entry *device_to_existing_context_entry( struct intel_iommu *iommu, u8 bus, u8 devfn); +static void __iommu_load_old_root_entry(struct intel_iommu *iommu); + +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index); + /* * A structure used to store the address allocated by ioremap(); * The we need to call iounmap() to free them out of spin_lock_irqsave/unlock; @@ -382,7 +386,6 @@ struct iommu_remapped_entry { static LIST_HEAD(__iommu_remapped_mem); static DEFINE_MUTEX(__iommu_mem_list_lock); - /* * This domain is a statically identity mapping domain. * 1. This domain creats a static 1:1 mapping to all usable memory. @@ -4935,3 +4938,52 @@ int __iommu_free_mapped_mem(void) return 0; } +/* + * Load the old root entry table to new root entry table. + */ +static void __iommu_load_old_root_entry(struct intel_iommu *iommu) +{ + if ((!iommu) + || (!iommu-root_entry) + || (!iommu-root_entry_old_virt) + || (!iommu-root_entry_old_phys)) + return; + memcpy(iommu-root_entry, iommu-root_entry_old_virt, PAGE_SIZE); + + __iommu_flush_cache(iommu, iommu-root_entry, PAGE_SIZE); +} + +/* + * When the data in new root entry table is changed, this function + * must be called to save the updated data to old root entry table. + */ +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index) +{ + u8 start; + unsigned long size; + void __iomem *to; + void *from; + + if ((!iommu) + || (!iommu-root_entry) + || (!iommu-root_entry_old_virt) + || (!iommu-root_entry_old_phys)) + return; + + if (index -1 || index = ROOT_ENTRY_NR) + return; + + if (index == -1) { + start = 0; + size = ROOT_ENTRY_NR * sizeof(struct root_entry); + } else { + start = index * sizeof(struct root_entry); + size = sizeof(struct root_entry); + } + to = iommu-root_entry_old_virt; + from = iommu-root_entry; + memcpy(to + start, from + start, size); + + __iommu_flush_cache(iommu, to + start, size); +} + diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index ced1fac..e7cac12 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -340,6 +340,9 @@ struct intel_iommu { spinlock_t lock; /* protect context, domain ids */ struct root_entry *root_entry; /* virtual address */ + void __iomem*root_entry_old_virt; /* mapped from old root entry */ + unsigned long root_entry_old_phys; /* root entry in old kernel */ + struct iommu_flush flush; #endif struct q_inval *qi;/* Queued invalidation info */ -- 2.0.0-rc0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Add context entry functions needed for kdump. Bill Sumner: Original version; Li, Zhenhua: Changed the name of new functions, make them consistent with current context get/set functions. Remove the structure dve which is not used in new version. Signed-off-by: Bill Sumner billsumnerli...@gmail.com Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com --- drivers/iommu/intel-iommu.c | 72 + 1 file changed, 72 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index cb9d6cc..1e7ceb5 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -190,6 +190,31 @@ struct root_entry { }; #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry)) +static inline bool root_present(struct root_entry *root) +{ + return (root-lo 1); +} + +static inline void set_root_value(struct root_entry *root, unsigned long value) +{ + root-lo = ~VTD_PAGE_MASK; + root-lo |= value VTD_PAGE_MASK; +} + +static inline struct context_entry * +get_context_addr_from_root(struct root_entry *root) +{ + return (struct context_entry *) + (root_present(root)?phys_to_virt( + root-lo VTD_PAGE_MASK) : + NULL); +} + +static inline unsigned long +get_context_phys_from_root(struct root_entry *root) +{ + return root_present(root) ? (root-lo VTD_PAGE_MASK) : 0; +} /* * low 64 bits: @@ -211,6 +236,32 @@ static inline bool context_present(struct context_entry *context) { return (context-lo 1); } + +static inline int context_fault_enable(struct context_entry *c) +{ + return((c-lo 1) 0x1); +} + +static inline int context_translation_type(struct context_entry *c) +{ + return((c-lo 2) 0x3); +} + +static inline u64 context_address_root(struct context_entry *c) +{ + return((c-lo VTD_PAGE_SHIFT)); +} + +static inline int context_address_width(struct context_entry *c) +{ + return((c-hi 0) 0x7); +} + +static inline int context_domain_id(struct context_entry *c) +{ + return((c-hi 8) 0x); +} + static inline void context_set_present(struct context_entry *context) { context-lo |= 1; @@ -296,6 +347,27 @@ static inline int first_pte_in_page(struct dma_pte *pte) return !((unsigned long)pte ~VTD_PAGE_MASK); } + +/* + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU + * + * Fixes the crashdump kernel to deal with an active iommu and legacy + * DMA from the (old) panicked kernel in a manner similar to how legacy + * DMA is handled when no hardware iommu was in use by the old kernel -- + * allow the legacy DMA to continue into its current buffers. + * + * In the crashdump kernel, this code: + * 1. skips disabling the IOMMU's translating. + * 2. Do not re-enable IOMMU's translating. + * 3. In kdump kernel, use the old root entry table. + * 4. Allocate pages for new context entry, copy data from old context entries + *in the old kernel to the new ones. + * + * In other kinds of kernel, for example, a kernel started by kexec, + * do the same thing as crashdump kernel. + */ + + Above comments should come along with the code changes instead of putting it in this patch. BTW, there's one more blank line at the end.. /* * This domain is a statically identity mapping domain. * 1. This domain creats a static 1:1 mapping to all usable memory. -- 2.0.0-rc0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Add some functions to copy the data from old kernel. These functions are used to copy context tables and page tables. To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore, use a link here, store the pointers , and then use iounmap to free them in another place. Li, Zhen-hua: The functions and logics. Takao Indoh: Check if pfn is ram: if (page_is_ram(pfn)) Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com --- drivers/iommu/intel-iommu.c | 102 include/linux/intel-iommu.h | 6 +++ 2 files changed, 108 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 07e6118..0b97c15 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -371,6 +371,17 @@ static struct context_entry *device_to_existing_context_entry( struct intel_iommu *iommu, u8 bus, u8 devfn); +/* + * A structure used to store the address allocated by ioremap(); + * The we need to call iounmap() to free them out of spin_lock_irqsave/unlock; + */ +struct iommu_remapped_entry { + struct list_head list; + void __iomem *mem; +}; +static LIST_HEAD(__iommu_remapped_mem); +static DEFINE_MUTEX(__iommu_mem_list_lock); + /* * This domain is a statically identity mapping domain. @@ -4833,3 +4844,94 @@ static struct context_entry *device_to_existing_context_entry( return ret; } +/* + * Copy memory from a physically-addressed area into a virtually-addressed area + */ +int __iommu_load_from_oldmem(void *to, unsigned long from, unsigned long size) +{ + unsigned long pfn; /* Page Frame Number */ + size_t csize = (size_t)size;/* Num(bytes to copy) */ + unsigned long offset; /* Lower 12 bits of to */ comment for above variable are unnecessary, they are clear enough. BTW, use size_t size in function argument is better. + void __iomem *virt_mem; + struct iommu_remapped_entry *mapped; + + pfn = from VTD_PAGE_SHIFT; + offset = from (~VTD_PAGE_MASK); + + if (page_is_ram(pfn)) { + memcpy(to, pfn_to_kaddr(pfn) + offset, csize); + } else{ + + mapped = kzalloc(sizeof(struct iommu_remapped_entry), + GFP_KERNEL); + if (!mapped) + return -ENOMEM; + + virt_mem = ioremap_cache((unsigned long)from, size); + if (!virt_mem) { + kfree(mapped); + return -ENOMEM; + } + memcpy(to, virt_mem, size); csize or size? as previous comment use size_t in argument it will be ok here. + + mutex_lock(__iommu_mem_list_lock); + mapped-mem = virt_mem; + list_add_tail(mapped-list, __iommu_remapped_mem); + mutex_unlock(__iommu_mem_list_lock); + } + return size; type mismatch? +} + +/* + * Copy memory from a virtually-addressed area into a physically-addressed area + */ +int __iommu_save_to_oldmem(unsigned long to, void *from, unsigned long size) +{ + unsigned long pfn; /* Page Frame Number */ + size_t csize = (size_t)size;/* Num(bytes to copy) */ + unsigned long offset; /* Lower 12 bits of to */ Ditto about data type and comments.. + void __iomem *virt_mem; + struct iommu_remapped_entry *mapped; + + pfn = to VTD_PAGE_SHIFT; + offset = to (~VTD_PAGE_MASK); + + if (page_is_ram(pfn)) { + memcpy(pfn_to_kaddr(pfn) + offset, from, csize); + } else{ + mapped = kzalloc(sizeof(struct iommu_remapped_entry), + GFP_KERNEL); + if (!mapped) + return -ENOMEM; + + virt_mem = ioremap_cache((unsigned long)to, size); + if (!virt_mem) { + kfree(mapped); + return -ENOMEM; + } + memcpy(virt_mem, from, size); + mutex_lock(__iommu_mem_list_lock); + mapped-mem = virt_mem; + list_add_tail(mapped-list, __iommu_remapped_mem); + mutex_unlock(__iommu_mem_list_lock); + } + return size; +} Above two functions looks duplicate, can they be merged as one function and add a argument about direction? + +/* + * Free the mapped memory for ioremap; + */ +int __iommu_free_mapped_mem(void) +{ + struct iommu_remapped_entry *mem_entry, *tmp; + + mutex_lock(__iommu_mem_list_lock); + list_for_each_entry_safe(mem_entry, tmp, __iommu_remapped_mem, list) { + iounmap(mem_entry-mem); + list_del(mem_entry-list); + kfree(mem_entry); + } +
Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump
The patch subject is bad, previous patch you use Items for kdump, this patch you use datatypes and functions used for kdump then what's the difference between these two patches? Please think about a better one for these patches.. On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Populate it with support functions to copy iommu translation tables from from the panicked kernel into the kdump kernel in the event of a crash. Functions: Use old root entry table, and load the old data to root_entry as cache. Malloc new context table and copy old context table to the new one. Bill Sumner: Original version, the creation of the data types and functions. Li, Zhenhua: Create new function iommu_check_pre_te_status() to check status. Update the caller of context_get_* and context_put*, use context_* and context_set_* for replacement. Update the name of the function that loads root entry table. Use new function to copy old context entry tables and page tables. Use unsigned long for physical address. Remove the functions to copy page table in Bill's version. Remove usage of dve and ppap in Bill's version. Signed-off-by: Bill Sumner billsumnerli...@gmail.com Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com --- drivers/iommu/intel-iommu.c | 121 include/linux/intel-iommu.h | 3 ++ 2 files changed, 124 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 3a5d446..28c3c64 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -386,6 +386,18 @@ struct iommu_remapped_entry { static LIST_HEAD(__iommu_remapped_mem); static DEFINE_MUTEX(__iommu_mem_list_lock); +/* Please remove the .. + * Copy iommu translation tables from old kernel into new kernel. One more space between new and kernel + * Entry to this set of functions is: intel_iommu_load_translation_tables() + * Drop above --- line is better + */ + +static int copy_root_entry_table(struct intel_iommu *iommu); + +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu); + +static void iommu_check_pre_te_status(struct intel_iommu *iommu); + /* * This domain is a statically identity mapping domain. * 1. This domain creats a static 1:1 mapping to all usable memory. @@ -4987,3 +4999,112 @@ static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int index) __iommu_flush_cache(iommu, to + start, size); } +/* + * Load root entry tables from old kernel. + */ +static int copy_root_entry_table(struct intel_iommu *iommu) +{ + u32 bus;/* Index: root-entry-table */ + struct root_entry *re; /* Virt(iterator: new table) */ + unsigned long context_old_phys; /* Phys(context table entry) */ + struct context_entry *context_new_virt; /* Virt(new context_entry) */ + + /* + * A new root entry table has been allocated , One more space before ,' + * we need copy re from old kernel to the new allocated one. + */ + + if (!iommu-root_entry_old_phys) + return -ENOMEM; + + for (bus = 0, re = iommu-root_entry; bus 256; bus += 1, re += 1) { + if (!root_present(re)) + continue; + + context_old_phys = get_context_phys_from_root(re); + + if (!context_old_phys) + continue; + + context_new_virt = + (struct context_entry *)alloc_pgtable_page(iommu-node); + + if (!context_new_virt) + return -ENOMEM; + + __iommu_load_from_oldmem(context_new_virt, + context_old_phys, + VTD_PAGE_SIZE); + + __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE); + + set_root_value(re, virt_to_phys(context_new_virt)); + } + + return 0; +} + +/* + * Interface to the load translation tables set of functions + * from mainline code. + */ +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu) +{ + unsigned long long q; /* quadword scratch */ + int ret = 0;/* Integer return code */ comment for ret is not necessary, quadword is also unnecessary, just explain the purpose of 'q' is enough. + unsigned long flags; + + q = dmar_readq(iommu-reg + DMAR_RTADDR_REG); + if (!q) + return -1; + + spin_lock_irqsave(iommu-lock, flags); + + /* Load the root-entry table from the old kernel + * foreach context_entry_table in root_entry + * Copy each entry table from old kernel + */ + if
Re: [PATCH 2/3] iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices
Hi Leizhen, Thanks for the review! On Tue, May 12, 2015 at 08:40:06AM +0100, leizhen wrote: + +static int queue_poll_cons(struct arm_smmu_queue *q, u32 until, bool wfe) +{ + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US); + + while (queue_sync_cons(q), __queue_cons_before(q, until)) { + if (ktime_compare(ktime_get(), timeout) 0) Is it good to limit hardware behavior? May be wait for ever will be better. If SMMU can not consume queue items under normal condition, the SMMU hardware is broken, will lead software system to be crashed later. I disagree. Having a broken SMMU lock-up the entire kernel is considerably worse than having e.g. a DMA master get stuck. If this timeout expires, then we'll print a message and continue without waiting any longer for a CMD_SYNC completion (see arm_smmu_cmdq_issue_cmd). + return -ETIMEDOUT; + + if (wfe) { + wfe(); + } else { + cpu_relax(); + udelay(1); + } + } + + return 0; +} + +static void queue_write(__le64 *dst, u64 *src, size_t n_dwords) +{ + int i; + + for (i = 0; i n_dwords; ++i) + *dst++ = cpu_to_le64(*src++); +} + +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent) +{ + if (queue_full(q)) + return -ENOSPC; + + queue_write(Q_ENT(q, q-prod), ent, q-ent_dwords); A dmb or dsb maybe needed. We must insure all data written are completed, then notify hardware to consume. The dsb is performed when we update the producer pointer in queue_inc_prod (since we use writel as opposed to writel_relaxed). + queue_inc_prod(q); + return 0; +} + + +/* High-level queue accessors */ +static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) +{ + memset(cmd, 0, CMDQ_ENT_DWORDS 3); + cmd[0] |= (ent-opcode CMDQ_0_OP_MASK) CMDQ_0_OP_SHIFT; + + switch (ent-opcode) { + case CMDQ_OP_TLBI_EL2_ALL: + case CMDQ_OP_CMD_SYNC: + cmd[0] |= CMDQ_SYNC_0_CS_SEV; We can not always set SIG_SEV, actually it should base upon SMMU_IDR0.SEV(smmu-features ARM_SMMU_FEAT_SEV) It doesn't matter for the CMD_SYNC command (which treats SIG_SEV as SIG_NONE in this case). What actually matters is that we don't perform wfe() when the SMMU can't issue the event, but that's already taken care of in arm_smmu_cmdq_issue_cmd. + break; + default: + return -ENOENT; + } + + return 0; +} + +static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, +struct io_pgtable_cfg *pgtbl_cfg) +{ + int ret; + u16 asid; + struct arm_smmu_device *smmu = smmu_domain-smmu; + struct arm_smmu_s1_cfg *cfg = smmu_domain-s1_cfg; + + asid = arm_smmu_bitmap_alloc(smmu-asid_map, smmu-asid_bits); + if (IS_ERR_VALUE(asid)) + return asid; + + cfg-cdptr = dma_zalloc_coherent(smmu-dev, CTXDESC_CD_DWORDS 3, + cfg-cdptr_dma, GFP_KERNEL); Why use dma_zalloc_coherent? iova is coverted from PA by calling phys_to_dma. I afraid PA and iova maybe not equal. In fact, the mapping between iova and PA is rely on SMMU driver itself. Why not use virt_to_phys to get PA, SMMU hardware actually require PA. The SMMU structure walker is not allowed to be chained into another SMMU, so the physical and DMA addresses are equal here. That said, the walker does not need to be cache coherent, so I need to use the coherent DMA allocator to get memory of the correct attributes. I also need the table to be naturally aligned. An alternative would be something like __get_free_pages with homebrew cache maintenance for non-coherent SMMUs, but I don't want to re-invent the DMA mapping code in the driver (we already have a similar mess with the page tables, which I'd like to sort out). +static int arm_smmu_alloc_l2_strtab(struct arm_smmu_device *smmu) +{ + int ret; + unsigned int i; + struct arm_smmu_strtab_cfg *cfg = smmu-strtab_cfg; + size_t size = sizeof(*cfg-l1_desc) * cfg-num_l1_descs; + void *strtab = smmu-strtab_cfg.strtab; + + cfg-l1_desc = devm_kzalloc(smmu-dev, size, GFP_KERNEL); + if (!cfg-l1_desc) { + dev_err(smmu-dev, failed to allocate l1 stream table desc\n); + return -ENOMEM; + } + + size = 1 (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); + for (i = 0; i cfg-num_l1_descs; ++i) { + struct arm_smmu_strtab_l1_desc *desc = cfg-l1_desc[i]; + + desc-span = STRTAB_SPLIT + 1; + desc-l2ptr = dma_zalloc_coherent(smmu-dev, size, + desc-l2ptr_dma, GFP_KERNEL); No, no,
Re: [RFC PATCH] iommu: add ARM short descriptor page table allocator.
2015-04-28 9:41 GMT+02:00 Yong Wu yong...@mediatek.com: This patch is for ARM Short Descriptor Format.It has 2-levels pagetable and the allocator supports 4K/64K/1M/16M. Signed-off-by: Yong Wu yong...@mediatek.com --- drivers/iommu/Kconfig| 7 + drivers/iommu/Makefile | 1 + drivers/iommu/io-pgtable-arm-short.c | 489 +++ drivers/iommu/io-pgtable.c | 4 + drivers/iommu/io-pgtable.h | 6 + 5 files changed, 507 insertions(+) create mode 100644 drivers/iommu/io-pgtable-arm-short.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index baa0d97..8e50e73 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -38,6 +38,13 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST If unsure, say N here. +config IOMMU_IO_PGTABLE_SHORT + bool ARMv7/v8 Short Descriptor Format + select IOMMU_IO_PGTABLE + help + Enable support for the ARM Short descriptor pagetable format. + It has 2-levels pagetable and The allocator supports 4K/64K/1M/16M. + endmenu config IOMMU_IOVA diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 080ffab..815b3c8 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o +obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o obj-$(CONFIG_IOMMU_IOVA) += iova.o obj-$(CONFIG_OF_IOMMU) += of_iommu.o obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o diff --git a/drivers/iommu/io-pgtable-arm-short.c b/drivers/iommu/io-pgtable-arm-short.c new file mode 100644 index 000..8d723ec --- /dev/null +++ b/drivers/iommu/io-pgtable-arm-short.c @@ -0,0 +1,489 @@ +/* + * Copyright (c) 2014-2015 MediaTek Inc. + * Author: Yong Wu yong...@mediatek.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#define pr_fmt(fmt)arm-short-desc io-pgtable: fmt + +#include linux/err.h +#include linux/mm.h +#include linux/iommu.h +#include linux/errno.h + +#include io-pgtable.h + +typedef u32 arm_short_iopte; + +struct arm_short_io_pgtable { + struct io_pgtable iop; + struct kmem_cache *ptekmem; + size_t pgd_size; + void*pgd; +}; + +#define io_pgtable_short_to_data(x)\ + container_of((x), struct arm_short_io_pgtable, iop) + +#define io_pgtable_ops_to_pgtable(x) \ + container_of((x), struct io_pgtable, ops) + +#define io_pgtable_short_ops_to_data(x)\ + io_pgtable_short_to_data(io_pgtable_ops_to_pgtable(x)) + +#define ARM_SHORT_MAX_ADDR_BITS32 + +#define ARM_SHORT_PGDIR_SHIFT 20 +#define ARM_SHORT_PAGE_SHIFT 12 +#define ARM_SHORT_PTRS_PER_PTE 256 +#define ARM_SHORT_BYTES_PER_PTE1024 + +/* 1 level pagetable */ +#define ARM_SHORT_F_PGD_TYPE_PAGE (0x1) +#define ARM_SHORT_F_PGD_TYPE_PAGE_MSK (0x3) +#define ARM_SHORT_F_PGD_TYPE_SECTION (0x2) +#define ARM_SHORT_F_PGD_TYPE_SUPERSECTION (0x2 | (1 18)) +#define ARM_SHORT_F_PGD_TYPE_SECTION_MSK (0x3 | (1 18)) +#define ARM_SHORT_F_PGD_TYPE_IS_PAGE(pgd) (((pgd) 0x3) == 1) +#define ARM_SHORT_F_PGD_TYPE_IS_SECTION(pgd) \ + (((pgd) ARM_SHORT_F_PGD_TYPE_SECTION_MSK) \ + == ARM_SHORT_F_PGD_TYPE_SECTION) +#define ARM_SHORT_F_PGD_TYPE_IS_SUPERSECTION(pgd) \ + (((pgd) ARM_SHORT_F_PGD_TYPE_SECTION_MSK) \ + == ARM_SHORT_F_PGD_TYPE_SUPERSECTION) + +#define ARM_SHORT_F_PGD_B_BIT BIT(2) +#define ARM_SHORT_F_PGD_C_BIT BIT(3) +#define ARM_SHORT_F_PGD_IMPLE_BIT BIT(9) +#define ARM_SHORT_F_PGD_S_BIT BIT(16) +#define ARM_SHORT_F_PGD_NG_BIT BIT(17) +#define ARM_SHORT_F_PGD_NS_BIT_PAGEBIT(3) +#define ARM_SHORT_F_PGD_NS_BIT_SECTION BIT(19) + +#define ARM_SHORT_F_PGD_PA_PAGETABLE_MSK 0xfc00 +#define ARM_SHORT_F_PGD_PA_SECTION_MSK 0xfff0 +#define ARM_SHORT_F_PGD_PA_SUPERSECTION_MSK0xff00 + +/* 2 level pagetable */ +#define ARM_SHORT_F_PTE_TYPE_GET(val) ((val) 0x3) +#define
Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re
On 05/12/2015 04:37 PM, Dave Young wrote: Seems the subject was truncated? Maybe re means root entry? Then please fix it On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Add functions to load root entry table from old kernel, and to save updated root entry table. Add two member in struct intel_iommu, to store the RTA in old kernel, and the mapped virt address of it. Please explain a bit what is RTA in patch log, it is unclear to most of people who do not know iommu details. If people want to understand this patchset, I assume they have read the vt-d specs. RTA is defined clearly in this spec. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
One thing must be pointed out: There is a known issue that hpsa driver cannot work well in kdump kernel. And this patchset is not intended to fix this problem. So this patchset cannot work with HP smart array devices which need hpsa driver. On 05/11/2015 05:52 PM, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table, copy data from the old ones that used by the old kernel. 5. Keep using the old page tables before driver is loaded. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 https://lkml.org/lkml/2015/1/12/35 https://lkml.org/lkml/2015/3/19/33 https://lkml.org/lkml/2015/4/10/135 Changelog[v11]: 1. Fix some conflicts with the latest upstream kernel. Add flush in iommu_context_addr Changelog[v10]: 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel(). Use one flag which stores the te and ir status in last kernel: iommu-pre_enabled_trans iommu-pre_enabled_ir Changelog[v9]: 1. Add new function iommu_attach_domain_with_id. 2. Do not copy old page tables, keep using the old ones. 3. Remove functions: intel_iommu_did_to_domain_values_entry intel_iommu_get_dids_from_old_kernel device_to_domain_id copy_page_addr copy_page_table copy_context_entry
Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump
On 05/12/2015 04:17 PM, Dave Young wrote: On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Add context entry functions needed for kdump. +/* + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU + * + * Fixes the crashdump kernel to deal with an active iommu and legacy + * DMA from the (old) panicked kernel in a manner similar to how legacy + * DMA is handled when no hardware iommu was in use by the old kernel -- + * allow the legacy DMA to continue into its current buffers. + * + * In the crashdump kernel, this code: + * 1. skips disabling the IOMMU's translating. + * 2. Do not re-enable IOMMU's translating. + * 3. In kdump kernel, use the old root entry table. + * 4. Allocate pages for new context entry, copy data from old context entries + *in the old kernel to the new ones. + * + * In other kinds of kernel, for example, a kernel started by kexec, + * do the same thing as crashdump kernel. + */ + + Above comments should come along with the code changes instead of putting it in this patch. BTW, there's one more blank line at the end.. Code change is in 00/10, the cover letter. And the blank does not matter, I checked the patch with scripts/checkpatch.pl, no warnings, no errors. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Modify the operation of the following functions when called during crash dump: iommu_context_addr free_context_table get_domain_for_dev init_dmars intel_iommu_init Bill Sumner: Original version. Zhenhua: The name of new calling functions. Do not disable and re-enable TE in kdump kernel. Use the did and gaw from old context entry; Signed-off-by: Bill Sumner billsumnerli...@gmail.com Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com --- drivers/iommu/intel-iommu.c | 95 +++-- 1 file changed, 83 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 28c3c64..91545bf 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu *iommu); static int intel_iommu_load_translation_tables(struct intel_iommu *iommu); static void iommu_check_pre_te_status(struct intel_iommu *iommu); +static u8 g_translation_pre_enabled; Hi Zhenhua, I haven't checked patch one by one, am going through the code flow. About g_translation_pre_enabled, I don't think it's necessary to define it as a global variable. Both its assignment and judgement are in init_dmars(). In this situation a local variable translation_pre_enabled in init_dmars() is enough. You can assign value to it here: iommu_check_pre_te_status(iommu); if (iommu-pre_enabled_trans) { translation_pre_enabled = 1; ... } Thanks Baoquan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module
On 05/13/15 at 10:28am, Li, ZhenHua wrote: +static u8 g_translation_pre_enabled; Hi Zhenhua, I haven't checked patch one by one, am going through the code flow. About g_translation_pre_enabled, I don't think it's necessary to define it as a global variable. Both its assignment and judgement are in init_dmars(). In this situation a local variable translation_pre_enabled in init_dmars() is enough. You can assign value to it here: iommu_check_pre_te_status(iommu); if (iommu-pre_enabled_trans) { translation_pre_enabled = 1; ... } Thanks Baoquan Hi Baoquan, This variable is only be used in this file, for it is defined as static. Till now, I think both global and local variable are fine, got the same thing. But I believe global is better, because if other functions want to know whether translation is enabled, this global variable is a good choice. OK, I don't insist on this. But I think we don't have obligation to consider the future usage for a function or variable. We can often see a static function is redefined as non-static since it need be used in other file or the similar thing for variable. It's also good to change it when it's really needed. Anyway, I am fine with this. Just for now it's a little uncomfortable to me. Thanks Baoquan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
To fix the dmar fault in running v10 with latest upstream version, added a flush in the new function iommu_context_addr. Thanks Zhenhua This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table, copy data from the old ones that used by the old kernel. 5. Keep using the old page tables before driver is loaded. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 https://lkml.org/lkml/2015/1/12/35 https://lkml.org/lkml/2015/3/19/33 https://lkml.org/lkml/2015/4/10/135 Changelog[v11]: 1. Fix some conflicts with the latest upstream kernel. Add flush in iommu_context_addr Changelog[v10]: 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel(). Use one flag which stores the te and ir status in last kernel: iommu-pre_enabled_trans iommu-pre_enabled_ir Changelog[v9]: 1. Add new function iommu_attach_domain_with_id. 2. Do not copy old page tables, keep using the old ones. 3. Remove functions: intel_iommu_did_to_domain_values_entry intel_iommu_get_dids_from_old_kernel device_to_domain_id copy_page_addr copy_page_table copy_context_entry copy_context_entry_table 4. Add new function device_to_existing_context_entry. Changelog[v8]: 1. Add a missing __iommu_flush_cache in function copy_page_table.
Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: Modify the operation of the following functions when called during crash dump: iommu_context_addr free_context_table get_domain_for_dev init_dmars intel_iommu_init Bill Sumner: Original version. Zhenhua: The name of new calling functions. Do not disable and re-enable TE in kdump kernel. Use the did and gaw from old context entry; Signed-off-by: Bill Sumner billsumnerli...@gmail.com Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com --- drivers/iommu/intel-iommu.c | 95 +++-- 1 file changed, 83 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 28c3c64..91545bf 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu *iommu); static int intel_iommu_load_translation_tables(struct intel_iommu *iommu); static void iommu_check_pre_te_status(struct intel_iommu *iommu); +static u8 g_translation_pre_enabled; /* * This domain is a statically identity mapping domain. @@ -794,6 +795,9 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu phy_addr = virt_to_phys((void *)context); *entry = phy_addr | 1; __iommu_flush_cache(iommu, entry, sizeof(*entry)); + + if (iommu-pre_enabled_trans) + __iommu_update_old_root_entry(iommu, bus); } return context[devfn]; } @@ -887,13 +891,15 @@ static void clear_context_table(struct intel_iommu *iommu, u8 bus, u8 devfn) static void free_context_table(struct intel_iommu *iommu) { + struct root_entry *root = NULL; int i; unsigned long flags; struct context_entry *context; spin_lock_irqsave(iommu-lock, flags); if (!iommu-root_entry) { - goto out; + spin_unlock_irqrestore(iommu-lock, flags); + return; } for (i = 0; i ROOT_ENTRY_NR; i++) { context = iommu_context_addr(iommu, i, 0, 0); @@ -908,10 +914,23 @@ static void free_context_table(struct intel_iommu *iommu) free_pgtable_page(context); } + + if (iommu-pre_enabled_trans) { + iommu-root_entry_old_phys = 0; + root = iommu-root_entry_old_virt; + iommu-root_entry_old_virt = NULL; + } + free_pgtable_page(iommu-root_entry); iommu-root_entry = NULL; -out: + spin_unlock_irqrestore(iommu-lock, flags); + + /* We put this out of spin_unlock is because iounmap() may + * cause error if surrounded by spin_lock and unlock; + */ + if (iommu-pre_enabled_trans) + iounmap(root); } static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, @@ -2333,6 +2352,7 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw) unsigned long flags; u8 bus, devfn; int did = -1; /* Default to no domain_id supplied */ + struct context_entry *ce = NULL; domain = find_domain(dev); if (domain) @@ -2366,6 +2386,20 @@ static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw) domain = alloc_domain(0); if (!domain) return NULL; + + if (iommu-pre_enabled_trans) { + /* + * if this device had a did in the old kernel + * use its values instead of generating new ones + */ + ce = device_to_existing_context_entry(iommu, bus, devfn); + + if (ce) { + did = context_domain_id(ce); + gaw = agaw_to_width(context_address_width(ce)); + } + } + domain-id = iommu_attach_domain_with_id(domain, iommu, did); if (domain-id 0) { free_domain_mem(domain); @@ -2897,6 +2931,7 @@ static int __init init_dmars(void) goto free_g_iommus; } + g_translation_pre_enabled = 0; /* To know whether to skip RMRR */ for_each_active_iommu(iommu, drhd) { g_iommus[iommu-seq_id] = iommu; @@ -2904,14 +2939,30 @@ static int __init init_dmars(void) if (ret) goto free_iommu; - /* - * TBD: - * we could share the same root context tables - * among all IOMMU's. Need to Split it later. - */ - ret = iommu_alloc_root_entry(iommu); - if (ret) - goto free_iommu; + iommu_check_pre_te_status(iommu); + if (iommu-pre_enabled_trans) { + pr_info(IOMMU Copying translate tables from panicked kernel\n); + ret = intel_iommu_load_translation_tables(iommu); +
Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote: This patchset is an update of Bill Sumner's patchset, implements a fix for: If a kernel boots with intel_iommu=on on a system that supports intel vt-d, when a panic happens, the kdump kernel will boot with these faults: dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 2 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear On some system, the interrupt remapping fault will also happen even if the intel_iommu is not set to on, because the interrupt remapping will be enabled when x2apic is needed by the system. The cause of the DMA fault is described in Bill's original version, and the INTR-Remap fault is caused by a similar reason. In short, the initialization of vt-d drivers causes the in-flight DMA and interrupt requests get wrong response. To fix this problem, we modifies the behaviors of the intel vt-d in the crashdump kernel: For DMA Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the translation, keep it enabled. 3. Use the old root entry table, do not rewrite the RTA register. 4. Malloc and use new context entry table, copy data from the old ones that used by the old kernel. 5. Keep using the old page tables before driver is loaded. 6. After device driver is loaded, when it issues the first dma_map command, free the dmar_domain structure for this device, and generate a new one, so that the device can be assigned a new and empty page table. 7. When a new context entry table is generated, we also save its address to the old root entry table. For Interrupt Remapping: 1. To accept the vt-d hardware in an active state, 2. Do not disable and re-enable the interrupt remapping, keep it enabled. 3. Use the old interrupt remapping table, do not rewrite the IRTA register. 4. When ioapic entry is setup, the interrupt remapping table is changed, and the updated data will be stored to the old interrupt remapping table. Advantages of this approach: 1. All manipulation of the IO-device is done by the Linux device-driver for that device. 2. This approach behaves in a manner very similar to operation without an active iommu. 3. Any activity between the IO-device and its RMRR areas is handled by the device-driver in the same manner as during a non-kdump boot. 4. If an IO-device has no driver in the kdump kernel, it is simply left alone. This supports the practice of creating a special kdump kernel without drivers for any devices that are not required for taking a crashdump. 5. Minimal code-changes among the existing mainline intel vt-d code. Summary of changes in this patch set: 1. Added some useful function for root entry table in code intel-iommu.c 2. Added new members to struct root_entry and struct irte; 3. Functions to load old root entry table to iommu-root_entry from the memory of old kernel. 4. Functions to malloc new context entry table and copy the data from the old ones to the malloced new ones. 5. Functions to enable support for DMA remapping in kdump kernel. 6. Functions to load old irte data from the old kernel to the kdump kernel. 7. Some code changes that support other behaviours that have been listed. 8. In the new functions, use physical address as unsigned long type, not pointers. Original version by Bill Sumner: https://lkml.org/lkml/2014/1/10/518 https://lkml.org/lkml/2014/4/15/716 https://lkml.org/lkml/2014/4/24/836 Zhenhua's updates: https://lkml.org/lkml/2014/10/21/134 https://lkml.org/lkml/2014/12/15/121 https://lkml.org/lkml/2014/12/22/53 https://lkml.org/lkml/2015/1/6/1166 https://lkml.org/lkml/2015/1/12/35 https://lkml.org/lkml/2015/3/19/33 https://lkml.org/lkml/2015/4/10/135 Changelog[v11]: 1. Fix some conflicts with the latest upstream kernel. Add flush in iommu_context_addr Changelog[v10]: 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel(). Use one flag which stores the te and ir status in last kernel: iommu-pre_enabled_trans iommu-pre_enabled_ir Changelog[v9]: 1. Add new function iommu_attach_domain_with_id. 2. Do not copy old page tables, keep using the old ones. 3. Remove functions: intel_iommu_did_to_domain_values_entry intel_iommu_get_dids_from_old_kernel device_to_domain_id copy_page_addr copy_page_table copy_context_entry copy_context_entry_table 4. Add new function device_to_existing_context_entry. Changelog[v8]: 1. Add a missing __iommu_flush_cache in function copy_page_table.
Re: [PATCH/RFC 0/4] Probe deferral for IOMMU DT integration
Hi Marek, On Wednesday 04 March 2015 10:20:36 Marek Szyprowski wrote: On 2015-02-16 17:14, Laurent Pinchart wrote: On Thursday 05 February 2015 16:31:58 Laura Abbott wrote: Hi, On the heels of Exynos integrating SMMU in to the DT for probing, this series attempts to add support to make SMMU drivers work with deferred probing. This has been referenced before[1] but this is some actual code to use as a starting point for discussion. The requirement for this is based on a previous patch to add clock support to the ARM SMMU driver[2]. Once we have clock support, it's possible that the driver itself may need to be defered which breaks a bunch of assumptions about how SMMU probing is supposed to work. The concept here is to have the driver call of_dma_configure which may return -EPROBE_DEFER. of_dma_configure could possibly be moved up to probe level. The existing method of initialization still needs to be done as well which might mean we have the worst of both worlds. Open questions: - This still doesn't really address Arnd's concerns[3] about disabling IOMMUs Arnd, Will and I have discussed IOMMU probe deferral last week. Here's a summary of the discussion, before the details blur away. The discussion covered both higher level concepts and lower level details, in various directions. I'll try to make the summary clear by filling the gaps between pieces where needed, so some of the information below might not be a direct result of the discussions. Arnd and Will, please feel free to correct me. The first question we've discussed was whether probe deferral for IOMMUs is really needed. Various dependencies between IOMMU devices and other devices exist, in particular on clocks (as you have mentioned above) and on power domains (as mentioned by Marek in http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/32323 8.html). While there are mechanism to handle some of them with probe deferral (for instance by using the OF_DECLARE macros to register clock drivers), generalizing those mechanisms would essentially recreate a probe ordering mechanism similar to link order probe ordering and couldn't really scale. Additionally, IOMMUs could also be present hot-pluggable devices and depend on resources that are thus hot-plugged. OF_DECLARE wouldn't help in that case. For all those reasons we have concluded that probe deferral for IOMMUs is desired if it can be implemented cleanly. The second question we've discussed was how to implement probe deferral cleanly :-) The current implementation configures DMA at device creation time and sets the following properties: - dma_map_ops (IOMMU, SWIOTLB, linear mapping) - initial DMA mask - DMA offset - DMA coherency Additionally the IOMMU group (when an IOMMU is present) will also be configured at the same time (patches are under review). The base idea is to defer probing of bus master devices when their IOMMU isn't present. Three cases need to be handled. 1. No IOMMU is declared by the firmware (through DT, ACPI, ...) for the bus master device. The bus master device probe doesn't need to be deferred due to the IOMMU. dma_map_ops must be set to SWIOTLB or linear mapping (the later should likely be implemented through SWIOTLB). 2. An IOMMU is declared for the bus master device and the IOMMU has been successfully probed and registered. The bus master device probe doesn't need to be deferred due to the IOMMU. dma_map_ops must be set to IOMMU ops. 3. An IOMMU is declared for the bus master device but the IOMMU isn't registered. This can be caused by different reasons: - a. No driver is loaded for this IOMMU (for instance because DT describes the IOMMU connection, but the IOMMU driver hasn't been developed yet, or an older kernel is used). If the IOMMU is optional the bus master device probe should succeed, and dma_map_ops should be set to linear. If the IOMMU is mandatory the bus master device probe should fail. Note that, as we require IOMMU drivers to be compiled in, we don't need to handle the case of loadable IOMMU drivers that haven't been loaded yet. - b. A driver is loaded for this IOMMU but the IOMMU hasn't been probed yet, or its probe has been deferred. The bus master device probe should be deferred. - c. A driver is loaded for this IOMMU but the IOMMU probe has failed permanently. It's not clear at the moment whether we should try to recover from this automatically using the same mechanism as case 3.a, or if we can considered this as an abnormal failure and disable the bus master device. Assuming that we should try to recover from such an error, we can't predict this case when the device is instantiated (even if IOMMUs are registered before bus master devices are added, for instance using the OF_DECLARE mechanism that Will has implemented). We thus
Re: [PATCH v6 00/25] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem
On Mon, May 11, 2015 at 6:00 PM, Javier Martinez Canillas jav...@dowhile0.org wrote: Hello Marek, On Mon, May 4, 2015 at 10:15 AM, Marek Szyprowski m.szyprow...@samsung.com wrote: Hello Everyone, This is yet another attempt to get Exynos SYSMMU driver with integrated with IOMMU DMA-mapping subsystems. The main change from previous version is addition of the patches to define iommu-mapping, which need to be created during system boot to avoid IOMMU fault by devices, which has been left enabled by bootloader (i.e. framebuffer displaying slash screen). Patches has been also rebased onto v4.1-rc2 with 'arm: dma-mapping: fix off-by-one check in arm_setup_iommu_dma_ops' patch applied (see commit 1424532b2163bf1580f4b1091a5801e12310fac5 on fixes branch in git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-arm.git, more information: http://www.spinics.net/lists/arm-kernel/msg414722.html). All patches are also available in the following git repository: https://git.linaro.org/people/marek.szyprowski/linux-srpol.git branch v4.1-exynos-iommu. Thanks for the new series, this time I didn't get a system hang when I enabled both CONFIG_DRM_EXYNOS_FIMD and CONFIG_DRM_EXYNOS_DP on my Exynos5420 Peach Pit Chromebook. The system finished booting and I have both a console on the HDMI and eDP/LVDS displays. But if I try to start X or weston, I again have a completele system hang with no output on the serial console. This works correctly without your IOMMU series. And again disabling CONFIG_DRM_EXYNOS_DP or CONFIG_DRM_EXYNOS_FIMD avoids the system to hang. If I only have CONFIG_DRM_EXYNOS_HDMI enabled, then I can start X with out issues and is displayed correctly in the HDMI output. So it seems your workaround for the boot-loader leaving the FIMD dma engine enabled is not enough? I'm not a graphics person so I can think of the top of my head what could trigger the system hang when X or weston are executed so I wondered if you have any ideas. I looked a a bit more on this issue today and found that the system hang when dma_alloc_attrs() is called from lowlevel_buffer_allocate() in drivers/gpu/drm/exynos/exynos_drm_buf.c [0]. The call chain is: exynos_drm_gem_create_ioctl() - exynos_drm_gem_create() - exynos_drm_alloc_buf() - lowlevel_buffer_allocate() - dma_alloc_attrs(). An interesting data point is that the allocation succeeds when a GEM buffer is created for fbdev but it causes a system hang when is created from the DRM ioctl API. For fbdev the call chain is: exynos_drm_fbdev_init() - exynos_drm_fbdev_init() - exynos_drm_fbdev_create() - exynos_drm_gem_create() - exynos_drm_alloc_buf() - lowlevel_buffer_allocated() - dma_alloc_attrs(). Although for fbdev, the EXYNOS_BO_CONTIG flag is passed to exynos_drm_gem_create() so the DMA_ATTR_FORCE_CONTIGUOUS attribute is set before calling dma_alloc_attrs(). So I don't know if is only working because the allocated memory is physically contiguous so the IOMMU mapping is less of an issue here? I also don't know if is relevant but I noticed that both for HDMI and DP, the address of the allocated a DMA buffer is always 0x2000. Which is the start of the iommu-reserved-mapping region for the exynos5420 peach pit. While the address of the allocated DMA buffer for the GEM buffer allocated with EXYNOS_GEM_CREATE is always != 0x2000 (i.e 0x2184c000). So in summary, Exynos HDMI DRM can allocate GEM buffers correctly fbdev and DRM while EXYNOS DRM DP/FIMD can only allocate a GEM buffer for fbdev, trying to allocate with the EXYNOS_GEM_CREATE ioctl hangs the system. Both fbdev and DRM works correctly with Exynos IOMMU disabled. I included some DRM debug log [2] (drm.debug=0xff) for DP/FIMD fbdev and DRM GEM buffer creation. Thanks a lot and best regards, Javier [0]: http://lxr.free-electrons.com/source/drivers/gpu/drm/exynos/exynos_drm_buf.c#L84 [1]: http://lxr.free-electrons.com/source/drivers/gpu/drm/exynos/exynos_drm_fbdev.c#L153 [2]: DP fbdev: [4.181241] [drm:drm_enable_connectors] connector 31 enabled? yes [4.181248] [drm:drm_target_preferred] looking for cmdline mode on connector 25 [4.181254] [drm:drm_target_preferred] looking for preferred mode on connector 25 0 [4.181259] [drm:drm_target_preferred] found mode 1366x768 [4.181265] [drm:drm_target_preferred] looking for cmdline mode on connector 31 [4.181270] [drm:drm_target_preferred] looking for preferred mode on connector 31 0 [4.181276] [drm:drm_target_preferred] found mode 1920x1080 [4.181281] [drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config [4.181290] [drm:drm_setup_crtcs] desired mode 1366x768 set on crtc 23 (0,0) [4.181298] [drm:drm_setup_crtcs] desired mode 1920x1080 set on crtc 29 (0,0) [4.181307] [drm:exynos_drm_fbdev_create] surface width(1920), height(1080) and bpp(32 [4.181314] [drm:exynos_drm_init_buf] desired size = 0x7e9000 [4.181327] [drm:exynos_drm_gem_init] created file object =