Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-12 Thread Baoquan He
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 This patchset is an update of Bill Sumner's patchset, implements a fix for:
 If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
 when a panic happens, the kdump kernel will boot with these faults:

Test passed on my local workstation. Thanks, Zhenhua.

 
 dmar: DRHD: handling fault status reg 102
 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
 DMAR:[fault reason 01] Present bit in root entry is clear
 
 dmar: DRHD: handling fault status reg 2
 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

2015-05-12 Thread Dave Young
Seems the subject was truncated? Maybe re means root entry? Then please fix it

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Add functions to load root entry table from old kernel, and to save updated
 root entry table.
 Add two member in struct intel_iommu, to store the RTA in old kernel, and
 the mapped virt address of it.

Please explain a bit what is RTA in patch log, it is unclear to most of people
who do not know iommu details.

 
 We use the old RTA in dump kernel, and when the iommu-root_entry is used as
 a cache in kdump kernel, its phys address will not be save to RTA register,
 but when its data is changed, we will save the new data to old root entry 
 table.
 
 Li, Zhen-hua:
 The functions and logics.
 
 Takao Indoh:
 Add __iommu_flush_cache.
 
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com
 ---
  drivers/iommu/intel-iommu.c | 54 
 -
  include/linux/intel-iommu.h |  3 +++
  2 files changed, 56 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 0b97c15..3a5d446 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -371,6 +371,10 @@ static struct context_entry 
 *device_to_existing_context_entry(
   struct intel_iommu *iommu,
   u8 bus, u8 devfn);
  
 +static void __iommu_load_old_root_entry(struct intel_iommu *iommu);
 +
 +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int 
 index);
 +
  /*
   * A structure used to store the address allocated by ioremap();
   * The we need to call iounmap() to free them out of 
 spin_lock_irqsave/unlock;
 @@ -382,7 +386,6 @@ struct iommu_remapped_entry {
  static LIST_HEAD(__iommu_remapped_mem);
  static DEFINE_MUTEX(__iommu_mem_list_lock);
  
 -
  /*
   * This domain is a statically identity mapping domain.
   *   1. This domain creats a static 1:1 mapping to all usable memory.
 @@ -4935,3 +4938,52 @@ int __iommu_free_mapped_mem(void)
   return 0;
  }
  
 +/*
 + * Load the old root entry table to new root entry table.
 + */
 +static void __iommu_load_old_root_entry(struct intel_iommu *iommu)
 +{
 + if ((!iommu)
 + || (!iommu-root_entry)
 + || (!iommu-root_entry_old_virt)
 + || (!iommu-root_entry_old_phys))
 + return;
 + memcpy(iommu-root_entry, iommu-root_entry_old_virt, PAGE_SIZE);
 +
 + __iommu_flush_cache(iommu, iommu-root_entry, PAGE_SIZE);
 +}
 +
 +/*
 + * When the data in new root entry table is changed, this function
 + * must be called to save the updated data to old root entry table.
 + */
 +static void __iommu_update_old_root_entry(struct intel_iommu *iommu, int 
 index)
 +{
 + u8 start;
 + unsigned long size;
 + void __iomem *to;
 + void *from;
 +
 + if ((!iommu)
 + || (!iommu-root_entry)
 + || (!iommu-root_entry_old_virt)
 + || (!iommu-root_entry_old_phys))
 + return;
 +
 + if (index  -1 || index = ROOT_ENTRY_NR)
 + return;
 +
 + if (index == -1) {
 + start = 0;
 + size = ROOT_ENTRY_NR * sizeof(struct root_entry);
 + } else {
 + start = index * sizeof(struct root_entry);
 + size = sizeof(struct root_entry);
 + }
 + to = iommu-root_entry_old_virt;
 + from = iommu-root_entry;
 + memcpy(to + start, from + start, size);
 +
 + __iommu_flush_cache(iommu, to + start, size);
 +}
 +
 diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
 index ced1fac..e7cac12 100644
 --- a/include/linux/intel-iommu.h
 +++ b/include/linux/intel-iommu.h
 @@ -340,6 +340,9 @@ struct intel_iommu {
   spinlock_t  lock; /* protect context, domain ids */
   struct root_entry *root_entry; /* virtual address */
  
 + void __iomem*root_entry_old_virt; /* mapped from old root entry */
 + unsigned long   root_entry_old_phys; /* root entry in old kernel */
 +
   struct iommu_flush flush;
  #endif
   struct q_inval  *qi;/* Queued invalidation info */
 -- 
 2.0.0-rc0
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

2015-05-12 Thread Dave Young
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Add context entry functions needed for kdump.
 
 Bill Sumner:
 Original version;
 
 Li, Zhenhua:
 Changed the name of new functions, make them consistent with current
 context get/set functions.
 Remove the structure dve which is not used in new version.
 
 Signed-off-by: Bill Sumner billsumnerli...@gmail.com
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 ---
  drivers/iommu/intel-iommu.c | 72 
 +
  1 file changed, 72 insertions(+)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index cb9d6cc..1e7ceb5 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -190,6 +190,31 @@ struct root_entry {
  };
  #define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))
  
 +static inline bool root_present(struct root_entry *root)
 +{
 + return (root-lo  1);
 +}
 +
 +static inline void set_root_value(struct root_entry *root, unsigned long 
 value)
 +{
 + root-lo = ~VTD_PAGE_MASK;
 + root-lo |= value  VTD_PAGE_MASK;
 +}
 +
 +static inline struct context_entry *
 +get_context_addr_from_root(struct root_entry *root)
 +{
 + return (struct context_entry *)
 + (root_present(root)?phys_to_virt(
 + root-lo  VTD_PAGE_MASK) :
 + NULL);
 +}
 +
 +static inline unsigned long
 +get_context_phys_from_root(struct root_entry *root)
 +{
 + return  root_present(root) ? (root-lo  VTD_PAGE_MASK) : 0;
 +}
  
  /*
   * low 64 bits:
 @@ -211,6 +236,32 @@ static inline bool context_present(struct context_entry 
 *context)
  {
   return (context-lo  1);
  }
 +
 +static inline int context_fault_enable(struct context_entry *c)
 +{
 + return((c-lo  1)  0x1);
 +}
 +
 +static inline int context_translation_type(struct context_entry *c)
 +{
 + return((c-lo  2)  0x3);
 +}
 +
 +static inline u64 context_address_root(struct context_entry *c)
 +{
 + return((c-lo  VTD_PAGE_SHIFT));
 +}
 +
 +static inline int context_address_width(struct context_entry *c)
 +{
 + return((c-hi  0)  0x7);
 +}
 +
 +static inline int context_domain_id(struct context_entry *c)
 +{
 + return((c-hi  8)  0x);
 +}
 +
  static inline void context_set_present(struct context_entry *context)
  {
   context-lo |= 1;
 @@ -296,6 +347,27 @@ static inline int first_pte_in_page(struct dma_pte *pte)
   return !((unsigned long)pte  ~VTD_PAGE_MASK);
  }
  
 +
 +/*
 + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
 + *
 + * Fixes the crashdump kernel to deal with an active iommu and legacy
 + * DMA from the (old) panicked kernel in a manner similar to how legacy
 + * DMA is handled when no hardware iommu was in use by the old kernel --
 + * allow the legacy DMA to continue into its current buffers.
 + *
 + * In the crashdump kernel, this code:
 + * 1. skips disabling the IOMMU's translating.
 + * 2. Do not re-enable IOMMU's translating.
 + * 3. In kdump kernel, use the old root entry table.
 + * 4. Allocate pages for new context entry, copy data from old context 
 entries
 + *in the old kernel to the new ones.
 + *
 + * In other kinds of kernel, for example, a kernel started by kexec,
 + * do the same thing as crashdump kernel.
 + */
 +
 +

Above comments should come along with the code changes instead of putting it
in this patch.

BTW, there's one more blank line at the end..

  /*
   * This domain is a statically identity mapping domain.
   *   1. This domain creats a static 1:1 mapping to all usable memory.
 -- 
 2.0.0-rc0
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 04/10] iommu/vt-d: functions to copy data from old mem

2015-05-12 Thread Dave Young
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Add some functions to copy the data from old kernel.
 These functions are used to copy context tables and page tables.
 
 To avoid calling iounmap between spin_lock_irqsave and spin_unlock_irqrestore,
 use a link here, store the pointers , and then use iounmap to free them in
 another place.
 
 Li, Zhen-hua:
 The functions and logics.
 
 Takao Indoh:
 Check if pfn is ram:
 if (page_is_ram(pfn))
 
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com
 ---
  drivers/iommu/intel-iommu.c | 102 
 
  include/linux/intel-iommu.h |   6 +++
  2 files changed, 108 insertions(+)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 07e6118..0b97c15 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -371,6 +371,17 @@ static struct context_entry 
 *device_to_existing_context_entry(
   struct intel_iommu *iommu,
   u8 bus, u8 devfn);
  
 +/*
 + * A structure used to store the address allocated by ioremap();
 + * The we need to call iounmap() to free them out of 
 spin_lock_irqsave/unlock;
 + */
 +struct iommu_remapped_entry {
 + struct list_head list;
 + void __iomem *mem;
 +};
 +static LIST_HEAD(__iommu_remapped_mem);
 +static DEFINE_MUTEX(__iommu_mem_list_lock);
 +
  
  /*
   * This domain is a statically identity mapping domain.
 @@ -4833,3 +4844,94 @@ static struct context_entry 
 *device_to_existing_context_entry(
   return ret;
  }
  
 +/*
 + * Copy memory from a physically-addressed area into a virtually-addressed 
 area
 + */
 +int __iommu_load_from_oldmem(void *to, unsigned long from, unsigned long 
 size)
 +{
 + unsigned long pfn;  /* Page Frame Number */
 + size_t csize = (size_t)size;/* Num(bytes to copy) */
 + unsigned long offset;   /* Lower 12 bits of to */

comment for above variable are unnecessary, they are clear enough.
BTW, use size_t size in function argument is better.

 + void __iomem *virt_mem;
 + struct iommu_remapped_entry *mapped;
 +
 + pfn = from  VTD_PAGE_SHIFT;
 + offset = from  (~VTD_PAGE_MASK);
 +
 + if (page_is_ram(pfn)) {
 + memcpy(to, pfn_to_kaddr(pfn) + offset, csize);
 + } else{
 +
 + mapped = kzalloc(sizeof(struct iommu_remapped_entry),
 + GFP_KERNEL);
 + if (!mapped)
 + return -ENOMEM;
 +
 + virt_mem = ioremap_cache((unsigned long)from, size);
 + if (!virt_mem) {
 + kfree(mapped);
 + return -ENOMEM;
 + }
 + memcpy(to, virt_mem, size);

csize or size? as previous comment use size_t in argument it will be ok here.

 +
 + mutex_lock(__iommu_mem_list_lock);
 + mapped-mem = virt_mem;
 + list_add_tail(mapped-list, __iommu_remapped_mem);
 + mutex_unlock(__iommu_mem_list_lock);
 + }
 + return size;

type mismatch?

 +}
 +
 +/*
 + * Copy memory from a virtually-addressed area into a physically-addressed 
 area
 + */
 +int __iommu_save_to_oldmem(unsigned long to, void *from, unsigned long size)
 +{
 + unsigned long pfn;  /* Page Frame Number */
 + size_t csize = (size_t)size;/* Num(bytes to copy) */
 + unsigned long offset;   /* Lower 12 bits of to */

Ditto about data type and comments..

 + void __iomem *virt_mem;
 + struct iommu_remapped_entry *mapped;
 +
 + pfn = to  VTD_PAGE_SHIFT;
 + offset = to  (~VTD_PAGE_MASK);
 +
 + if (page_is_ram(pfn)) {
 + memcpy(pfn_to_kaddr(pfn) + offset, from, csize);
 + } else{
 + mapped = kzalloc(sizeof(struct iommu_remapped_entry),
 + GFP_KERNEL);
 + if (!mapped)
 + return -ENOMEM;
 +
 + virt_mem = ioremap_cache((unsigned long)to, size);
 + if (!virt_mem) {
 + kfree(mapped);
 + return -ENOMEM;
 + }
 + memcpy(virt_mem, from, size);
 + mutex_lock(__iommu_mem_list_lock);
 + mapped-mem = virt_mem;
 + list_add_tail(mapped-list, __iommu_remapped_mem);
 + mutex_unlock(__iommu_mem_list_lock);
 + }
 + return size;
 +}

Above two functions looks duplicate, can they be merged as one function and
add a argument about direction?

 +
 +/*
 + * Free the mapped memory for ioremap;
 + */
 +int __iommu_free_mapped_mem(void)
 +{
 + struct iommu_remapped_entry *mem_entry, *tmp;
 +
 + mutex_lock(__iommu_mem_list_lock);
 + list_for_each_entry_safe(mem_entry, tmp, __iommu_remapped_mem, list) {
 + iounmap(mem_entry-mem);
 + list_del(mem_entry-list);
 + kfree(mem_entry);
 + }
 + 

Re: [PATCH v11 06/10] iommu/vt-d: datatypes and functions used for kdump

2015-05-12 Thread Dave Young
The patch subject is bad, previous patch you use Items for kdump, this
patch you use datatypes and functions used for kdump then what's the
difference between these two patches?

Please think about a better one for these patches..

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Populate it with support functions to copy iommu translation tables from
 from the panicked kernel into the kdump kernel in the event of a crash.
 
 Functions:
 Use old root entry table, and load the old data to root_entry as cache.
 Malloc new context table and copy old context table to the new one.
 
 Bill Sumner:
 Original version, the creation of the data types and functions.
 
 Li, Zhenhua:
 Create new function iommu_check_pre_te_status() to check status.
 Update the caller of context_get_* and context_put*, use context_*
 and context_set_* for replacement.
 Update the name of the function that loads root entry table.
 Use new function to copy old context entry tables and page tables.
 Use unsigned long for physical address.
 Remove the functions to copy page table in Bill's version.
 Remove usage of dve and ppap in Bill's version.
 
 Signed-off-by: Bill Sumner billsumnerli...@gmail.com
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 ---
  drivers/iommu/intel-iommu.c | 121 
 
  include/linux/intel-iommu.h |   3 ++
  2 files changed, 124 insertions(+)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 3a5d446..28c3c64 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -386,6 +386,18 @@ struct iommu_remapped_entry {
  static LIST_HEAD(__iommu_remapped_mem);
  static DEFINE_MUTEX(__iommu_mem_list_lock);
  
 +/* 

Please remove the ..

 + * Copy iommu translation tables from old kernel into new  kernel.

One more space between new and kernel

 + * Entry to this set of functions is: intel_iommu_load_translation_tables()
 + * 

Drop above --- line is better

 + */
 +
 +static int copy_root_entry_table(struct intel_iommu *iommu);
 +
 +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
 +
 +static void iommu_check_pre_te_status(struct intel_iommu *iommu);
 +
  /*
   * This domain is a statically identity mapping domain.
   *   1. This domain creats a static 1:1 mapping to all usable memory.
 @@ -4987,3 +4999,112 @@ static void __iommu_update_old_root_entry(struct 
 intel_iommu *iommu, int index)
   __iommu_flush_cache(iommu, to + start, size);
  }
  
 +/*
 + * Load root entry tables from old kernel.
 + */
 +static int copy_root_entry_table(struct intel_iommu *iommu)
 +{
 + u32 bus;/* Index: root-entry-table */
 + struct root_entry  *re; /* Virt(iterator: new table) */
 + unsigned long context_old_phys; /* Phys(context table entry) */
 + struct context_entry *context_new_virt; /* Virt(new context_entry) */
 +
 + /*
 +  * A new root entry table has been allocated ,

One more space before ,'

 +  * we need copy re from old kernel to the new allocated one.
 +  */
 +
 + if (!iommu-root_entry_old_phys)
 + return -ENOMEM;
 +
 + for (bus = 0, re = iommu-root_entry; bus  256; bus += 1, re += 1) {
 + if (!root_present(re))
 + continue;
 +
 + context_old_phys = get_context_phys_from_root(re);
 +
 + if (!context_old_phys)
 + continue;
 +
 + context_new_virt =
 + (struct context_entry *)alloc_pgtable_page(iommu-node);
 +
 + if (!context_new_virt)
 + return -ENOMEM;
 +
 + __iommu_load_from_oldmem(context_new_virt,
 + context_old_phys,
 + VTD_PAGE_SIZE);
 +
 + __iommu_flush_cache(iommu, context_new_virt, VTD_PAGE_SIZE);
 +
 + set_root_value(re, virt_to_phys(context_new_virt));
 + }
 +
 + return 0;
 +}
 +
 +/*
 + * Interface to the load translation tables set of functions
 + * from mainline code.
 + */
 +static int intel_iommu_load_translation_tables(struct intel_iommu *iommu)
 +{
 + unsigned long long q;   /* quadword scratch */
 + int ret = 0;/* Integer return code */

comment for ret is not necessary, quadword is also unnecessary, just explain
the purpose of 'q' is enough.

 + unsigned long flags;
 +
 + q = dmar_readq(iommu-reg + DMAR_RTADDR_REG);
 + if (!q)
 + return -1;
 +
 + spin_lock_irqsave(iommu-lock, flags);
 +
 + /* Load the root-entry table from the old kernel
 +  * foreach context_entry_table in root_entry
 +  *   Copy each entry table from old kernel
 +  */
 + if 

Re: [PATCH 2/3] iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices

2015-05-12 Thread Will Deacon
Hi Leizhen,

Thanks for the review!

On Tue, May 12, 2015 at 08:40:06AM +0100, leizhen wrote:
 
  +
  +static int queue_poll_cons(struct arm_smmu_queue *q, u32 until, bool wfe)
  +{
  + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US);
  +
  + while (queue_sync_cons(q), __queue_cons_before(q, until)) {
  + if (ktime_compare(ktime_get(), timeout)  0)
 
 Is it good to limit hardware behavior? May be wait for ever will be
 better. If SMMU can not consume queue items under normal condition, the
 SMMU hardware is broken, will lead software system to be crashed later.

I disagree. Having a broken SMMU lock-up the entire kernel is considerably
worse than having e.g. a DMA master get stuck. If this timeout expires,
then we'll print a message and continue without waiting any longer for
a CMD_SYNC completion (see arm_smmu_cmdq_issue_cmd).

  + return -ETIMEDOUT;
  +
  + if (wfe) {
  + wfe();
  + } else {
  + cpu_relax();
  + udelay(1);
  + }
  + }
  +
  + return 0;
  +}
  +
  +static void queue_write(__le64 *dst, u64 *src, size_t n_dwords)
  +{
  + int i;
  +
  + for (i = 0; i  n_dwords; ++i)
  + *dst++ = cpu_to_le64(*src++);
  +}
  +
  +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
  +{
  + if (queue_full(q))
  + return -ENOSPC;
  +
  + queue_write(Q_ENT(q, q-prod), ent, q-ent_dwords);
 
 A dmb or dsb maybe needed. We must insure all data written are completed,
 then notify hardware to consume.

The dsb is performed when we update the producer pointer in queue_inc_prod
(since we use writel as opposed to writel_relaxed).

  + queue_inc_prod(q);
  + return 0;
  +}
  +
  +
  +/* High-level queue accessors */
  +static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
  +{
  + memset(cmd, 0, CMDQ_ENT_DWORDS  3);
  + cmd[0] |= (ent-opcode  CMDQ_0_OP_MASK)  CMDQ_0_OP_SHIFT;
  +
  + switch (ent-opcode) {
  + case CMDQ_OP_TLBI_EL2_ALL:
 
  + case CMDQ_OP_CMD_SYNC:
  + cmd[0] |= CMDQ_SYNC_0_CS_SEV;
 
 We can not always set SIG_SEV, actually it should base upon
 SMMU_IDR0.SEV(smmu-features  ARM_SMMU_FEAT_SEV)

It doesn't matter for the CMD_SYNC command (which treats SIG_SEV as
SIG_NONE in this case). What actually matters is that we don't perform
wfe() when the SMMU can't issue the event, but that's already taken care
of in arm_smmu_cmdq_issue_cmd.

  + break;
  + default:
  + return -ENOENT;
  + }
  +
  + return 0;
  +}
 
  +
  +static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
  +struct io_pgtable_cfg *pgtbl_cfg)
  +{
  + int ret;
  + u16 asid;
  + struct arm_smmu_device *smmu = smmu_domain-smmu;
  + struct arm_smmu_s1_cfg *cfg = smmu_domain-s1_cfg;
  +
  + asid = arm_smmu_bitmap_alloc(smmu-asid_map, smmu-asid_bits);
  + if (IS_ERR_VALUE(asid))
  + return asid;
  +
  + cfg-cdptr = dma_zalloc_coherent(smmu-dev, CTXDESC_CD_DWORDS  3,
  +  cfg-cdptr_dma, GFP_KERNEL);
 
 Why use dma_zalloc_coherent? iova is coverted from PA by calling
 phys_to_dma. I afraid PA and iova maybe not equal. In fact, the mapping
 between iova and PA is rely on SMMU driver itself. Why not use
 virt_to_phys to get PA, SMMU hardware actually require PA.

The SMMU structure walker is not allowed to be chained into another SMMU,
so the physical and DMA addresses are equal here. That said, the walker
does not need to be cache coherent, so I need to use the coherent DMA
allocator to get memory of the correct attributes. I also need the table
to be naturally aligned.

An alternative would be something like __get_free_pages with homebrew
cache maintenance for non-coherent SMMUs, but I don't want to re-invent
the DMA mapping code in the driver (we already have a similar mess with
the page tables, which I'd like to sort out).

  +static int arm_smmu_alloc_l2_strtab(struct arm_smmu_device *smmu)
  +{
  + int ret;
  + unsigned int i;
  + struct arm_smmu_strtab_cfg *cfg = smmu-strtab_cfg;
  + size_t size = sizeof(*cfg-l1_desc) * cfg-num_l1_descs;
  + void *strtab = smmu-strtab_cfg.strtab;
  +
  + cfg-l1_desc = devm_kzalloc(smmu-dev, size, GFP_KERNEL);
  + if (!cfg-l1_desc) {
  + dev_err(smmu-dev, failed to allocate l1 stream table 
  desc\n);
  + return -ENOMEM;
  + }
  +
  + size = 1  (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
  + for (i = 0; i  cfg-num_l1_descs; ++i) {
  + struct arm_smmu_strtab_l1_desc *desc = cfg-l1_desc[i];
  +
  + desc-span = STRTAB_SPLIT + 1;
  + desc-l2ptr = dma_zalloc_coherent(smmu-dev, size,
  +   desc-l2ptr_dma, 
  GFP_KERNEL);
 
 No, no, 

Re: [RFC PATCH] iommu: add ARM short descriptor page table allocator.

2015-05-12 Thread Matthias Brugger
2015-04-28 9:41 GMT+02:00 Yong Wu yong...@mediatek.com:
 This patch is for ARM Short Descriptor Format.It has 2-levels
 pagetable and the allocator supports 4K/64K/1M/16M.

 Signed-off-by: Yong Wu yong...@mediatek.com
 ---
  drivers/iommu/Kconfig|   7 +
  drivers/iommu/Makefile   |   1 +
  drivers/iommu/io-pgtable-arm-short.c | 489 
 +++
  drivers/iommu/io-pgtable.c   |   4 +
  drivers/iommu/io-pgtable.h   |   6 +
  5 files changed, 507 insertions(+)
  create mode 100644 drivers/iommu/io-pgtable-arm-short.c

 diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
 index baa0d97..8e50e73 100644
 --- a/drivers/iommu/Kconfig
 +++ b/drivers/iommu/Kconfig
 @@ -38,6 +38,13 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST

   If unsure, say N here.

 +config IOMMU_IO_PGTABLE_SHORT
 +   bool ARMv7/v8 Short Descriptor Format
 +   select IOMMU_IO_PGTABLE
 +   help
 + Enable support for the ARM Short descriptor pagetable format.
 + It has 2-levels pagetable and The allocator supports 4K/64K/1M/16M.
 +
  endmenu

  config IOMMU_IOVA
 diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
 index 080ffab..815b3c8 100644
 --- a/drivers/iommu/Makefile
 +++ b/drivers/iommu/Makefile
 @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 +obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o
  obj-$(CONFIG_IOMMU_IOVA) += iova.o
  obj-$(CONFIG_OF_IOMMU) += of_iommu.o
  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
 diff --git a/drivers/iommu/io-pgtable-arm-short.c 
 b/drivers/iommu/io-pgtable-arm-short.c
 new file mode 100644
 index 000..8d723ec
 --- /dev/null
 +++ b/drivers/iommu/io-pgtable-arm-short.c
 @@ -0,0 +1,489 @@
 +/*
 + * Copyright (c) 2014-2015 MediaTek Inc.
 + * Author: Yong Wu yong...@mediatek.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + */
 +#define pr_fmt(fmt)arm-short-desc io-pgtable: fmt
 +
 +#include linux/err.h
 +#include linux/mm.h
 +#include linux/iommu.h
 +#include linux/errno.h
 +
 +#include io-pgtable.h
 +
 +typedef u32 arm_short_iopte;
 +
 +struct arm_short_io_pgtable {
 +   struct io_pgtable   iop;
 +   struct kmem_cache   *ptekmem;
 +   size_t  pgd_size;
 +   void*pgd;
 +};
 +
 +#define io_pgtable_short_to_data(x)\
 +   container_of((x), struct arm_short_io_pgtable, iop)
 +
 +#define io_pgtable_ops_to_pgtable(x)   \
 +   container_of((x), struct io_pgtable, ops)
 +
 +#define io_pgtable_short_ops_to_data(x)\
 +   io_pgtable_short_to_data(io_pgtable_ops_to_pgtable(x))
 +
 +#define ARM_SHORT_MAX_ADDR_BITS32
 +
 +#define ARM_SHORT_PGDIR_SHIFT  20
 +#define ARM_SHORT_PAGE_SHIFT   12
 +#define ARM_SHORT_PTRS_PER_PTE 256
 +#define ARM_SHORT_BYTES_PER_PTE1024
 +
 +/* 1 level pagetable */
 +#define ARM_SHORT_F_PGD_TYPE_PAGE  (0x1)
 +#define ARM_SHORT_F_PGD_TYPE_PAGE_MSK  (0x3)
 +#define ARM_SHORT_F_PGD_TYPE_SECTION   (0x2)
 +#define ARM_SHORT_F_PGD_TYPE_SUPERSECTION  (0x2 | (1  18))
 +#define ARM_SHORT_F_PGD_TYPE_SECTION_MSK   (0x3 | (1  18))
 +#define ARM_SHORT_F_PGD_TYPE_IS_PAGE(pgd)  (((pgd)  0x3) == 1)
 +#define ARM_SHORT_F_PGD_TYPE_IS_SECTION(pgd)   \
 +   (((pgd)  ARM_SHORT_F_PGD_TYPE_SECTION_MSK) \
 +   == ARM_SHORT_F_PGD_TYPE_SECTION)
 +#define ARM_SHORT_F_PGD_TYPE_IS_SUPERSECTION(pgd)  \
 +   (((pgd)  ARM_SHORT_F_PGD_TYPE_SECTION_MSK) \
 +   == ARM_SHORT_F_PGD_TYPE_SUPERSECTION)
 +
 +#define ARM_SHORT_F_PGD_B_BIT  BIT(2)
 +#define ARM_SHORT_F_PGD_C_BIT  BIT(3)
 +#define ARM_SHORT_F_PGD_IMPLE_BIT  BIT(9)
 +#define ARM_SHORT_F_PGD_S_BIT  BIT(16)
 +#define ARM_SHORT_F_PGD_NG_BIT BIT(17)
 +#define ARM_SHORT_F_PGD_NS_BIT_PAGEBIT(3)
 +#define ARM_SHORT_F_PGD_NS_BIT_SECTION BIT(19)
 +
 +#define ARM_SHORT_F_PGD_PA_PAGETABLE_MSK   0xfc00
 +#define ARM_SHORT_F_PGD_PA_SECTION_MSK 0xfff0
 +#define ARM_SHORT_F_PGD_PA_SUPERSECTION_MSK0xff00
 +
 +/* 2 level pagetable */
 +#define ARM_SHORT_F_PTE_TYPE_GET(val)  ((val)  0x3)
 +#define 

Re: [PATCH v11 05/10] iommu/vt-d: Add functions to load and save old re

2015-05-12 Thread Li, ZhenHua

On 05/12/2015 04:37 PM, Dave Young wrote:

Seems the subject was truncated? Maybe re means root entry? Then please fix it

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:

Add functions to load root entry table from old kernel, and to save updated
root entry table.
Add two member in struct intel_iommu, to store the RTA in old kernel, and
the mapped virt address of it.


Please explain a bit what is RTA in patch log, it is unclear to most of people
who do not know iommu details.





If people want to understand this patchset, I assume they have read the 
vt-d specs. RTA is defined clearly in this spec.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-12 Thread Li, ZhenHua

One thing must be pointed out:
There is a known issue that hpsa driver cannot work well in kdump 
kernel. And this patchset is not intended to fix this problem.


So this patchset cannot work with HP smart array devices which need hpsa 
driver.


On 05/11/2015 05:52 PM, Li, Zhen-Hua wrote:

This patchset is an update of Bill Sumner's patchset, implements a fix for:
If a kernel boots with intel_iommu=on on a system that supports intel vt-d,
when a panic happens, the kdump kernel will boot with these faults:

 dmar: DRHD: handling fault status reg 102
 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
 DMAR:[fault reason 01] Present bit in root entry is clear

 dmar: DRHD: handling fault status reg 2
 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

On some system, the interrupt remapping fault will also happen even if the
intel_iommu is not set to on, because the interrupt remapping will be enabled
when x2apic is needed by the system.

The cause of the DMA fault is described in Bill's original version, and the
INTR-Remap fault is caused by a similar reason. In short, the initialization
of vt-d drivers causes the in-flight DMA and interrupt requests get wrong
response.

To fix this problem, we modifies the behaviors of the intel vt-d in the
crashdump kernel:

For DMA Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the translation, keep it enabled.
3. Use the old root entry table, do not rewrite the RTA register.
4. Malloc and use new context entry table, copy data from the old ones that
used by the old kernel.
5. Keep using the old page tables before driver is loaded.
6. After device driver is loaded, when it issues the first dma_map command,
free the dmar_domain structure for this device, and generate a new one, so
that the device can be assigned a new and empty page table.
7. When a new context entry table is generated, we also save its address to
the old root entry table.

For Interrupt Remapping:
1. To accept the vt-d hardware in an active state,
2. Do not disable and re-enable the interrupt remapping, keep it enabled.
3. Use the old interrupt remapping table, do not rewrite the IRTA register.
4. When ioapic entry is setup, the interrupt remapping table is changed, and
the updated data will be stored to the old interrupt remapping table.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.
5. Minimal code-changes among the existing mainline intel vt-d code.

Summary of changes in this patch set:
1. Added some useful function for root entry table in code intel-iommu.c
2. Added new members to struct root_entry and struct irte;
3. Functions to load old root entry table to iommu-root_entry from the memory
of old kernel.
4. Functions to malloc new context entry table and copy the data from the old
ones to the malloced new ones.
5. Functions to enable support for DMA remapping in kdump kernel.
6. Functions to load old irte data from the old kernel to the kdump kernel.
7. Some code changes that support other behaviours that have been listed.
8. In the new functions, use physical address as unsigned long type, not
pointers.

Original version by Bill Sumner:
 https://lkml.org/lkml/2014/1/10/518
 https://lkml.org/lkml/2014/4/15/716
 https://lkml.org/lkml/2014/4/24/836

Zhenhua's updates:
 https://lkml.org/lkml/2014/10/21/134
 https://lkml.org/lkml/2014/12/15/121
 https://lkml.org/lkml/2014/12/22/53
 https://lkml.org/lkml/2015/1/6/1166
 https://lkml.org/lkml/2015/1/12/35
 https://lkml.org/lkml/2015/3/19/33
 https://lkml.org/lkml/2015/4/10/135

Changelog[v11]:
 1. Fix some conflicts with the latest upstream kernel.
Add flush in iommu_context_addr

Changelog[v10]:
 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
Use one flag which stores the te and ir status in last kernel:
iommu-pre_enabled_trans
iommu-pre_enabled_ir

Changelog[v9]:
 1. Add new function iommu_attach_domain_with_id.
 2. Do not copy old page tables, keep using the old ones.
 3. Remove functions:
intel_iommu_did_to_domain_values_entry
intel_iommu_get_dids_from_old_kernel
device_to_domain_id
copy_page_addr
copy_page_table
copy_context_entry

Re: [PATCH v11 02/10] iommu/vt-d: Items required for kdump

2015-05-12 Thread Li, ZhenHua

On 05/12/2015 04:17 PM, Dave Young wrote:

On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:

Add context entry functions needed for kdump.
+/*
+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
+ *
+ * Fixes the crashdump kernel to deal with an active iommu and legacy
+ * DMA from the (old) panicked kernel in a manner similar to how legacy
+ * DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * In the crashdump kernel, this code:
+ * 1. skips disabling the IOMMU's translating.
+ * 2. Do not re-enable IOMMU's translating.
+ * 3. In kdump kernel, use the old root entry table.
+ * 4. Allocate pages for new context entry, copy data from old context entries
+ *in the old kernel to the new ones.
+ *
+ * In other kinds of kernel, for example, a kernel started by kexec,
+ * do the same thing as crashdump kernel.
+ */
+
+


Above comments should come along with the code changes instead of putting it
in this patch.

BTW, there's one more blank line at the end..

Code change is in 00/10, the cover letter.
And the blank does not matter, I checked the patch with 
scripts/checkpatch.pl,  no warnings, no errors.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

2015-05-12 Thread Baoquan He
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Modify the operation of the following functions when called during crash dump:
 iommu_context_addr
 free_context_table
 get_domain_for_dev
 init_dmars
 intel_iommu_init
 
 Bill Sumner:
 Original version.
 
 Zhenhua:
 The name of new calling functions.
 Do not disable and re-enable TE in kdump kernel.
 Use the did and gaw from old context entry;
 
 Signed-off-by: Bill Sumner billsumnerli...@gmail.com
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 ---
  drivers/iommu/intel-iommu.c | 95 
 +++--
  1 file changed, 83 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 28c3c64..91545bf 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu 
 *iommu);
  static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
  
  static void iommu_check_pre_te_status(struct intel_iommu *iommu);
 +static u8 g_translation_pre_enabled;
Hi Zhenhua,

I haven't checked patch one by one, am going through the code flow.

About g_translation_pre_enabled, I don't think it's necessary to define
it as a global variable. Both its assignment and judgement are in
init_dmars(). In this situation a local variable translation_pre_enabled
in init_dmars() is enough.

You can assign value to it here:

iommu_check_pre_te_status(iommu);
if (iommu-pre_enabled_trans) {
translation_pre_enabled = 1;
...
}

Thanks
Baoquan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

2015-05-12 Thread Baoquan He
On 05/13/15 at 10:28am, Li, ZhenHua wrote:
 +static u8 g_translation_pre_enabled;
 Hi Zhenhua,
 
 I haven't checked patch one by one, am going through the code flow.
 
 About g_translation_pre_enabled, I don't think it's necessary to define
 it as a global variable. Both its assignment and judgement are in
 init_dmars(). In this situation a local variable translation_pre_enabled
 in init_dmars() is enough.
 
 You can assign value to it here:
 
  iommu_check_pre_te_status(iommu);
  if (iommu-pre_enabled_trans) {
  translation_pre_enabled = 1;
  ...
  }
 
 Thanks
 Baoquan
 
 Hi Baoquan,
 This variable is only be used in this file, for it is defined as static.
 Till now, I think both global and local variable are fine, got the same
 thing.
 But I believe global is better, because if other functions want to
 know whether translation is enabled, this global variable is a good
 choice.

OK, I don't insist on this. But I think we don't have obligation to
consider the future usage for a function or variable. We can often see a
static function is redefined as non-static since it need be used in
other file or the similar thing for variable. It's also good to change
it when it's really needed.

Anyway, I am fine with this. Just for now it's a little uncomfortable to
me.

Thanks
Baoquan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-12 Thread Li, Zhen-Hua
To fix the dmar fault in running v10 with latest upstream version,  added a 
flush in the new function iommu_context_addr.

Thanks 
Zhenhua

 
 This patchset is an update of Bill Sumner's patchset, implements a fix for:
 If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
 when a panic happens, the kdump kernel will boot with these faults:
 
   dmar: DRHD: handling fault status reg 102
   dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
   DMAR:[fault reason 01] Present bit in root entry is clear
 
   dmar: DRHD: handling fault status reg 2
   dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
   INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
 
 On some system, the interrupt remapping fault will also happen even if the 
 intel_iommu is not set to on, because the interrupt remapping will be enabled 
 when x2apic is needed by the system.
 
 The cause of the DMA fault is described in Bill's original version, and the 
 INTR-Remap fault is caused by a similar reason. In short, the initialization 
 of vt-d drivers causes the in-flight DMA and interrupt requests get wrong 
 response.
 
 To fix this problem, we modifies the behaviors of the intel vt-d in the 
 crashdump kernel:
 
 For DMA Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the translation, keep it enabled.
 3. Use the old root entry table, do not rewrite the RTA register.
 4. Malloc and use new context entry table, copy data from the old ones that
  used by the old kernel.
 5. Keep using the old page tables before driver is loaded.
 6. After device driver is loaded, when it issues the first dma_map command, 
  free the dmar_domain structure for this device, and generate a new one, so 
  that the device can be assigned a new and empty page table. 
 7. When a new context entry table is generated, we also save its address to 
  the old root entry table.
 
 For Interrupt Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
 4. When ioapic entry is setup, the interrupt remapping table is changed, and 
  the updated data will be stored to the old interrupt remapping table.
 
 Advantages of this approach:
 1. All manipulation of the IO-device is done by the Linux device-driver
  for that device.
 2. This approach behaves in a manner very similar to operation without an
  active iommu.
 3. Any activity between the IO-device and its RMRR areas is handled by the
  device-driver in the same manner as during a non-kdump boot.
 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
  This supports the practice of creating a special kdump kernel without
  drivers for any devices that are not required for taking a crashdump. 
 5. Minimal code-changes among the existing mainline intel vt-d code.
 
 Summary of changes in this patch set:
 1. Added some useful function for root entry table in code intel-iommu.c
 2. Added new members to struct root_entry and struct irte;
 3. Functions to load old root entry table to iommu-root_entry from the 
 memory 
  of old kernel.
 4. Functions to malloc new context entry table and copy the data from the old
  ones to the malloced new ones.
 5. Functions to enable support for DMA remapping in kdump kernel.
 6. Functions to load old irte data from the old kernel to the kdump kernel.
 7. Some code changes that support other behaviours that have been listed.
 8. In the new functions, use physical address as unsigned long type, not 
  pointers.
 
 Original version by Bill Sumner:
   https://lkml.org/lkml/2014/1/10/518
   https://lkml.org/lkml/2014/4/15/716
   https://lkml.org/lkml/2014/4/24/836
 
 Zhenhua's updates:
   https://lkml.org/lkml/2014/10/21/134
   https://lkml.org/lkml/2014/12/15/121
   https://lkml.org/lkml/2014/12/22/53
   https://lkml.org/lkml/2015/1/6/1166
   https://lkml.org/lkml/2015/1/12/35
   https://lkml.org/lkml/2015/3/19/33
   https://lkml.org/lkml/2015/4/10/135
 
 Changelog[v11]:
   1. Fix some conflicts with the latest upstream kernel.
  Add flush in iommu_context_addr
 
 Changelog[v10]:
   1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
  Use one flag which stores the te and ir status in last kernel:
  iommu-pre_enabled_trans
  iommu-pre_enabled_ir
 
 Changelog[v9]:
   1. Add new function iommu_attach_domain_with_id.
   2. Do not copy old page tables, keep using the old ones.
   3. Remove functions:
  intel_iommu_did_to_domain_values_entry
  intel_iommu_get_dids_from_old_kernel
  device_to_domain_id
  copy_page_addr
  copy_page_table
  copy_context_entry
  copy_context_entry_table
   4. Add new function device_to_existing_context_entry.
 
 Changelog[v8]:
   1. Add a missing __iommu_flush_cache in function copy_page_table.
 
 

Re: [PATCH v11 07/10] iommu/vt-d: enable kdump support in iommu module

2015-05-12 Thread Dave Young
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 Modify the operation of the following functions when called during crash dump:
 iommu_context_addr
 free_context_table
 get_domain_for_dev
 init_dmars
 intel_iommu_init
 
 Bill Sumner:
 Original version.
 
 Zhenhua:
 The name of new calling functions.
 Do not disable and re-enable TE in kdump kernel.
 Use the did and gaw from old context entry;
 
 Signed-off-by: Bill Sumner billsumnerli...@gmail.com
 Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
 ---
  drivers/iommu/intel-iommu.c | 95 
 +++--
  1 file changed, 83 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 28c3c64..91545bf 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -397,6 +397,7 @@ static int copy_root_entry_table(struct intel_iommu 
 *iommu);
  static int intel_iommu_load_translation_tables(struct intel_iommu *iommu);
  
  static void iommu_check_pre_te_status(struct intel_iommu *iommu);
 +static u8 g_translation_pre_enabled;
  
  /*
   * This domain is a statically identity mapping domain.
 @@ -794,6 +795,9 @@ static inline struct context_entry 
 *iommu_context_addr(struct intel_iommu *iommu
   phy_addr = virt_to_phys((void *)context);
   *entry = phy_addr | 1;
   __iommu_flush_cache(iommu, entry, sizeof(*entry));
 +
 + if (iommu-pre_enabled_trans)
 + __iommu_update_old_root_entry(iommu, bus);
   }
   return context[devfn];
  }
 @@ -887,13 +891,15 @@ static void clear_context_table(struct intel_iommu 
 *iommu, u8 bus, u8 devfn)
  
  static void free_context_table(struct intel_iommu *iommu)
  {
 + struct root_entry *root = NULL;
   int i;
   unsigned long flags;
   struct context_entry *context;
  
   spin_lock_irqsave(iommu-lock, flags);
   if (!iommu-root_entry) {
 - goto out;
 + spin_unlock_irqrestore(iommu-lock, flags);
 + return;
   }
   for (i = 0; i  ROOT_ENTRY_NR; i++) {
   context = iommu_context_addr(iommu, i, 0, 0);
 @@ -908,10 +914,23 @@ static void free_context_table(struct intel_iommu 
 *iommu)
   free_pgtable_page(context);
  
   }
 +
 + if (iommu-pre_enabled_trans) {
 + iommu-root_entry_old_phys = 0;
 + root = iommu-root_entry_old_virt;
 + iommu-root_entry_old_virt = NULL;
 + }
 +
   free_pgtable_page(iommu-root_entry);
   iommu-root_entry = NULL;
 -out:
 +
   spin_unlock_irqrestore(iommu-lock, flags);
 +
 + /* We put this out of spin_unlock is because iounmap() may
 +  * cause error if surrounded by spin_lock and unlock;
 +  */
 + if (iommu-pre_enabled_trans)
 + iounmap(root);
  }
  
  static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 @@ -2333,6 +2352,7 @@ static struct dmar_domain *get_domain_for_dev(struct 
 device *dev, int gaw)
   unsigned long flags;
   u8 bus, devfn;
   int did = -1;   /* Default to no domain_id supplied */
 + struct context_entry *ce = NULL;
  
   domain = find_domain(dev);
   if (domain)
 @@ -2366,6 +2386,20 @@ static struct dmar_domain *get_domain_for_dev(struct 
 device *dev, int gaw)
   domain = alloc_domain(0);
   if (!domain)
   return NULL;
 +
 + if (iommu-pre_enabled_trans) {
 + /*
 +  * if this device had a did in the old kernel
 +  * use its values instead of generating new ones
 +  */
 + ce = device_to_existing_context_entry(iommu, bus, devfn);
 +
 + if (ce) {
 + did = context_domain_id(ce);
 + gaw = agaw_to_width(context_address_width(ce));
 + }
 + }
 +
   domain-id = iommu_attach_domain_with_id(domain, iommu, did);
   if (domain-id  0) {
   free_domain_mem(domain);
 @@ -2897,6 +2931,7 @@ static int __init init_dmars(void)
   goto free_g_iommus;
   }
  
 + g_translation_pre_enabled = 0; /* To know whether to skip RMRR */
   for_each_active_iommu(iommu, drhd) {
   g_iommus[iommu-seq_id] = iommu;
  
 @@ -2904,14 +2939,30 @@ static int __init init_dmars(void)
   if (ret)
   goto free_iommu;
  
 - /*
 -  * TBD:
 -  * we could share the same root  context tables
 -  * among all IOMMU's. Need to Split it later.
 -  */
 - ret = iommu_alloc_root_entry(iommu);
 - if (ret)
 - goto free_iommu;
 + iommu_check_pre_te_status(iommu);
 + if (iommu-pre_enabled_trans) {
 + pr_info(IOMMU Copying translate tables from panicked 
 kernel\n);
 + ret = intel_iommu_load_translation_tables(iommu);
 +

Re: [PATCH v11 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

2015-05-12 Thread Dave Young
On 05/11/15 at 05:52pm, Li, Zhen-Hua wrote:
 This patchset is an update of Bill Sumner's patchset, implements a fix for:
 If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
 when a panic happens, the kdump kernel will boot with these faults:
 
 dmar: DRHD: handling fault status reg 102
 dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr fff8
 DMAR:[fault reason 01] Present bit in root entry is clear
 
 dmar: DRHD: handling fault status reg 2
 dmar: INTR-REMAP: Request device [[61:00.0] fault index 42
 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
 
 On some system, the interrupt remapping fault will also happen even if the 
 intel_iommu is not set to on, because the interrupt remapping will be enabled 
 when x2apic is needed by the system.
 
 The cause of the DMA fault is described in Bill's original version, and the 
 INTR-Remap fault is caused by a similar reason. In short, the initialization 
 of vt-d drivers causes the in-flight DMA and interrupt requests get wrong 
 response.
 
 To fix this problem, we modifies the behaviors of the intel vt-d in the 
 crashdump kernel:
 
 For DMA Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the translation, keep it enabled.
 3. Use the old root entry table, do not rewrite the RTA register.
 4. Malloc and use new context entry table, copy data from the old ones that
used by the old kernel.
 5. Keep using the old page tables before driver is loaded.
 6. After device driver is loaded, when it issues the first dma_map command, 
free the dmar_domain structure for this device, and generate a new one, so 
that the device can be assigned a new and empty page table. 
 7. When a new context entry table is generated, we also save its address to 
the old root entry table.
 
 For Interrupt Remapping:
 1. To accept the vt-d hardware in an active state,
 2. Do not disable and re-enable the interrupt remapping, keep it enabled.
 3. Use the old interrupt remapping table, do not rewrite the IRTA register.
 4. When ioapic entry is setup, the interrupt remapping table is changed, and 
the updated data will be stored to the old interrupt remapping table.
 
 Advantages of this approach:
 1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
 2. This approach behaves in a manner very similar to operation without an
active iommu.
 3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
 4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump. 
 5. Minimal code-changes among the existing mainline intel vt-d code.
 
 Summary of changes in this patch set:
 1. Added some useful function for root entry table in code intel-iommu.c
 2. Added new members to struct root_entry and struct irte;
 3. Functions to load old root entry table to iommu-root_entry from the 
 memory 
of old kernel.
 4. Functions to malloc new context entry table and copy the data from the old
ones to the malloced new ones.
 5. Functions to enable support for DMA remapping in kdump kernel.
 6. Functions to load old irte data from the old kernel to the kdump kernel.
 7. Some code changes that support other behaviours that have been listed.
 8. In the new functions, use physical address as unsigned long type, not 
pointers.
 
 Original version by Bill Sumner:
 https://lkml.org/lkml/2014/1/10/518
 https://lkml.org/lkml/2014/4/15/716
 https://lkml.org/lkml/2014/4/24/836
 
 Zhenhua's updates:
 https://lkml.org/lkml/2014/10/21/134
 https://lkml.org/lkml/2014/12/15/121
 https://lkml.org/lkml/2014/12/22/53
 https://lkml.org/lkml/2015/1/6/1166
 https://lkml.org/lkml/2015/1/12/35
 https://lkml.org/lkml/2015/3/19/33
 https://lkml.org/lkml/2015/4/10/135
 
 Changelog[v11]:
 1. Fix some conflicts with the latest upstream kernel.
Add flush in iommu_context_addr
 
 Changelog[v10]:
 1. Do not use CONFIG_CRASH_DUMP and is_kdump_kernel().
Use one flag which stores the te and ir status in last kernel:
iommu-pre_enabled_trans
iommu-pre_enabled_ir
 
 Changelog[v9]:
 1. Add new function iommu_attach_domain_with_id.
 2. Do not copy old page tables, keep using the old ones.
 3. Remove functions:
intel_iommu_did_to_domain_values_entry
intel_iommu_get_dids_from_old_kernel
device_to_domain_id
copy_page_addr
copy_page_table
copy_context_entry
copy_context_entry_table
 4. Add new function device_to_existing_context_entry.
 
 Changelog[v8]:
 1. Add a missing __iommu_flush_cache in function copy_page_table.
 
 

Re: [PATCH/RFC 0/4] Probe deferral for IOMMU DT integration

2015-05-12 Thread Laurent Pinchart
Hi Marek,

On Wednesday 04 March 2015 10:20:36 Marek Szyprowski wrote:
 On 2015-02-16 17:14, Laurent Pinchart wrote:
  On Thursday 05 February 2015 16:31:58 Laura Abbott wrote:
  Hi,
  
  On the heels of Exynos integrating SMMU in to the DT for probing,
  this series attempts to add support to make SMMU drivers work with
  deferred probing. This has been referenced before[1] but this is
  some actual code to use as a starting point for discussion.
  
  The requirement for this is based on a previous patch to add clock
  support to the ARM SMMU driver[2]. Once we have clock support, it's
  possible that the driver itself may need to be defered which breaks
  a bunch of assumptions about how SMMU probing is supposed to work.
  The concept here is to have the driver call of_dma_configure which
  may return -EPROBE_DEFER. of_dma_configure could possibly be moved
  up to probe level. The existing method of initialization still needs
  to be done as well which might mean we have the worst of both worlds.
  
  Open questions:
  - This still doesn't really address Arnd's concerns[3] about disabling
  IOMMUs
  
  Arnd, Will and I have discussed IOMMU probe deferral last week. Here's a
  summary of the discussion, before the details blur away.
  
  The discussion covered both higher level concepts and lower level details,
  in various directions. I'll try to make the summary clear by filling the
  gaps between pieces where needed, so some of the information below might
  not be a direct result of the discussions. Arnd and Will, please feel
  free to correct me.
  
  The first question we've discussed was whether probe deferral for IOMMUs
  is really needed. Various dependencies between IOMMU devices and other
  devices exist, in particular on clocks (as you have mentioned above) and
  on power domains (as mentioned by Marek in
  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/32323
  8.html). While there are mechanism to handle some of them with probe
  deferral (for instance by using the OF_DECLARE macros to register clock
  drivers), generalizing those mechanisms would essentially recreate a
  probe ordering mechanism similar to link order probe ordering and
  couldn't really scale.
  
  Additionally, IOMMUs could also be present hot-pluggable devices and
  depend on resources that are thus hot-plugged. OF_DECLARE wouldn't help
  in that case. For all those reasons we have concluded that probe deferral
  for IOMMUs is desired if it can be implemented cleanly.
  
  The second question we've discussed was how to implement probe deferral
  cleanly :-)
  
  The current implementation configures DMA at device creation time and sets
  the following properties:
  
  - dma_map_ops (IOMMU, SWIOTLB, linear mapping)
  - initial DMA mask
  - DMA offset
  - DMA coherency
  
  Additionally the IOMMU group (when an IOMMU is present) will also be
  configured at the same time (patches are under review).
  
  The base idea is to defer probing of bus master devices when their IOMMU
  isn't present. Three cases need to be handled.
  
  1. No IOMMU is declared by the firmware (through DT, ACPI, ...) for the
  bus master device. The bus master device probe doesn't need to be deferred
  due to the IOMMU. dma_map_ops must be set to SWIOTLB or linear mapping
  (the later should likely be implemented through SWIOTLB).
  
  2. An IOMMU is declared for the bus master device and the IOMMU has been
  successfully probed and registered. The bus master device probe doesn't
  need to be deferred due to the IOMMU. dma_map_ops must be set to IOMMU
  ops.
  
  3. An IOMMU is declared for the bus master device but the IOMMU isn't
  registered. This can be caused by different reasons:
  
  - a. No driver is loaded for this IOMMU (for instance because DT describes
  the IOMMU connection, but the IOMMU driver hasn't been developed yet, or
  an older kernel is used). If the IOMMU is optional the bus master device
  probe should succeed, and dma_map_ops should be set to linear. If the
  IOMMU is mandatory the bus master device probe should fail.
  
  Note that, as we require IOMMU drivers to be compiled in, we don't need to
  handle the case of loadable IOMMU drivers that haven't been loaded yet.
  
  - b. A driver is loaded for this IOMMU but the IOMMU hasn't been probed
  yet, or its probe has been deferred. The bus master device probe should
  be deferred.
  
  - c. A driver is loaded for this IOMMU but the IOMMU probe has failed
  permanently. It's not clear at the moment whether we should try to recover
  from this automatically using the same mechanism as case 3.a, or if we can
  considered this as an abnormal failure and disable the bus master device.
  
  Assuming that we should try to recover from such an error, we can't
  predict this case when the device is instantiated (even if IOMMUs are
  registered before bus master devices are added, for instance using the
  OF_DECLARE mechanism that Will has implemented). We thus 

Re: [PATCH v6 00/25] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem

2015-05-12 Thread Javier Martinez Canillas
On Mon, May 11, 2015 at 6:00 PM, Javier Martinez Canillas
jav...@dowhile0.org wrote:
 Hello Marek,

 On Mon, May 4, 2015 at 10:15 AM, Marek Szyprowski
 m.szyprow...@samsung.com wrote:
 Hello Everyone,

 This is yet another attempt to get Exynos SYSMMU driver with integrated
 with IOMMU  DMA-mapping subsystems. The main change from previous
 version is addition of the patches to define iommu-mapping, which need
 to be created during system boot to avoid IOMMU fault by devices, which
 has been left enabled by bootloader (i.e. framebuffer displaying slash
 screen).

 Patches has been also rebased onto v4.1-rc2 with 'arm: dma-mapping: fix
 off-by-one check in arm_setup_iommu_dma_ops' patch applied (see commit
 1424532b2163bf1580f4b1091a5801e12310fac5 on fixes branch in
 git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-arm.git,
 more information:
 http://www.spinics.net/lists/arm-kernel/msg414722.html).

 All patches are also available in the following git repository:
 https://git.linaro.org/people/marek.szyprowski/linux-srpol.git
 branch v4.1-exynos-iommu.


 Thanks for the new series, this time I didn't get a system hang when I
 enabled both CONFIG_DRM_EXYNOS_FIMD and CONFIG_DRM_EXYNOS_DP on my
 Exynos5420 Peach Pit Chromebook.

 The system finished booting and I have both a console on the HDMI and
 eDP/LVDS displays. But if I try to start X or weston, I again have a
 completele system hang with no output on the serial console. This
 works correctly without your IOMMU series.

 And again disabling CONFIG_DRM_EXYNOS_DP or CONFIG_DRM_EXYNOS_FIMD
 avoids the system to hang. If I only have CONFIG_DRM_EXYNOS_HDMI
 enabled, then I can start X with out issues and is displayed correctly
 in the HDMI output.

 So it seems your workaround for the boot-loader leaving the FIMD dma
 engine enabled is not enough? I'm not a graphics person so I can think
 of the top of my head what could trigger the system hang when X or
 weston are executed so I wondered if you have any ideas.


I looked a a bit more on this issue today and found that the system
hang when dma_alloc_attrs() is called from lowlevel_buffer_allocate()
in drivers/gpu/drm/exynos/exynos_drm_buf.c [0].

The call chain is:

exynos_drm_gem_create_ioctl() - exynos_drm_gem_create() -
exynos_drm_alloc_buf() - lowlevel_buffer_allocate() -
dma_alloc_attrs().

An interesting data point is that the allocation succeeds when a GEM
buffer is created for fbdev but it causes a system hang when is
created from the DRM ioctl API. For fbdev the call chain is:

exynos_drm_fbdev_init() - exynos_drm_fbdev_init() -
exynos_drm_fbdev_create() - exynos_drm_gem_create() -
exynos_drm_alloc_buf() - lowlevel_buffer_allocated() -
dma_alloc_attrs().

Although for fbdev, the EXYNOS_BO_CONTIG flag is passed to
exynos_drm_gem_create() so the DMA_ATTR_FORCE_CONTIGUOUS attribute is
set before calling  dma_alloc_attrs(). So I don't know if is only
working because the allocated memory is physically contiguous so the
IOMMU mapping is less of an issue here?

I also don't know if is relevant but I noticed that both for HDMI and
DP, the address of the allocated a DMA buffer is always 0x2000.
Which is the start of the iommu-reserved-mapping region for the
exynos5420 peach pit.

While the address of the allocated DMA buffer for the GEM buffer
allocated with EXYNOS_GEM_CREATE is always != 0x2000 (i.e
0x2184c000).

So in summary, Exynos HDMI DRM can allocate GEM buffers correctly
fbdev and DRM while EXYNOS DRM DP/FIMD can only allocate a GEM buffer
for fbdev, trying to allocate with the EXYNOS_GEM_CREATE ioctl hangs
the system. Both fbdev and DRM works correctly with Exynos IOMMU
disabled.

I included some DRM debug log [2] (drm.debug=0xff) for DP/FIMD fbdev
and DRM GEM buffer creation.

Thanks a lot and best regards,
Javier

[0]: 
http://lxr.free-electrons.com/source/drivers/gpu/drm/exynos/exynos_drm_buf.c#L84
[1]: 
http://lxr.free-electrons.com/source/drivers/gpu/drm/exynos/exynos_drm_fbdev.c#L153
[2]:

DP fbdev:

[4.181241] [drm:drm_enable_connectors] connector 31 enabled? yes
[4.181248] [drm:drm_target_preferred] looking for cmdline mode on
connector 25
[4.181254] [drm:drm_target_preferred] looking for preferred mode
on connector 25 0
[4.181259] [drm:drm_target_preferred] found mode 1366x768
[4.181265] [drm:drm_target_preferred] looking for cmdline mode on
connector 31
[4.181270] [drm:drm_target_preferred] looking for preferred mode
on connector 31 0
[4.181276] [drm:drm_target_preferred] found mode 1920x1080
[4.181281] [drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config
[4.181290] [drm:drm_setup_crtcs] desired mode 1366x768 set on crtc 23 (0,0)
[4.181298] [drm:drm_setup_crtcs] desired mode 1920x1080 set on crtc 29 (0,0)
[4.181307] [drm:exynos_drm_fbdev_create] surface width(1920),
height(1080) and bpp(32
[4.181314] [drm:exynos_drm_init_buf] desired size = 0x7e9000
[4.181327] [drm:exynos_drm_gem_init] created file object =