Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-13 Thread Li, ZhenHua

On 01/12/2015 11:22 PM, Joerg Roedel wrote:

On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:

+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
+ *
+ * Fixes the crashdump kernel to deal with an active iommu and legacy
+ * DMA from the (old) panicked kernel in a manner similar to how legacy
+ * DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * In the crashdump kernel, this code:
+ * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA).
+ * 2. Do not re-enable IOMMU's translating.
+ * 3. In kdump kernel, use the old root entry table.
+ * 4. Leaves the current translations in-place so that legacy DMA will
+ *continue to use its current buffers.
+ * 5. Allocates to the device drivers in the crashdump kernel
+ *portions of the iova address ranges that are different
+ *from the iova address ranges that were being used by the old kernel
+ *at the time of the panic.
+ *
+ */


It looks like you are still copying the io-page-tables from the old
kernel into the kdump kernel, is that right? With the approach that was
proposed you only need to copy over the context entries 1-1. They are
still pointing to the page-tables in the old kernels memory (which is
just fine).

The root-entry of the old kernel is also re-used, and when the kdump
kernel starts to use a device, its context entry is updated to point to
a newly allocated page-table.



Current status:
1. Use old root entry table,
2. Copy old context entry tables.
3. Copy old page tables.
4. Allocate new page table when device driver is loading.

I remember the progress we discussed was this. Checking the old mails,
founding we did not clearly discuss about the old page tables, only
agrred with allocating new page tables when device driver is loading.

To me, both of these two options are fine, copying old page tables, or
re-use the old ones. My debugging shows both of them works fine.

If we use the old page tables, the patchset will use less code.
Functions copy_page_addr, copy_page_table are no longer needed, and
functions copy_context_entry and copy_context_entry_table will be
reduced to several lines. The patchset will be much simpler.

Zhenhua


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-13 Thread Baoquan He
On 01/12/15 at 10:29am, Vivek Goyal wrote:
 On Mon, Jan 12, 2015 at 04:22:08PM +0100, Joerg Roedel wrote:
  It looks like you are still copying the io-page-tables from the old
  kernel into the kdump kernel, is that right? With the approach that was
  proposed you only need to copy over the context entries 1-1. They are
  still pointing to the page-tables in the old kernels memory (which is
  just fine).
 
 Kdump has the notion of backup region. Where certain parts of old kernels
 memory can be moved to a different location (first 640K on x86 as of now)
 and new kernel can make use of this memory now.

Hi Vivek,

About backup region I am a bit confusing. Just say x86, we usually copy
it to a backup region. And this first 640K will be used as a usable
memory region in 2nd kernel since its content has been copied to backup
region. And that backup region is taken from crashkernel reserved
memory and not passed to 2nd kernel as usable memory region.

Here did you mean the old page table could fall into first 640K memory
region or that reserved backup region?  Since as I understand the backup
region is taken from crashkernel memory which is not used by 1st kernel
process.

Thanks
Baoquan

 
 So we will have to just make sure that no parts of this old page table
 fall into backup region.
 
 Thanks
 Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-13 Thread Joerg Roedel
On Tue, Jan 13, 2015 at 04:12:29PM +0800, Li, ZhenHua wrote:

 Current status:
 1. Use old root entry table,
 2. Copy old context entry tables.
 3. Copy old page tables.
 4. Allocate new page table when device driver is loading.
 
 I remember the progress we discussed was this. Checking the old mails,
 founding we did not clearly discuss about the old page tables, only
 agrred with allocating new page tables when device driver is loading.
 
 To me, both of these two options are fine, copying old page tables, or
 re-use the old ones. My debugging shows both of them works fine.
 
 If we use the old page tables, the patchset will use less code.
 Functions copy_page_addr, copy_page_table are no longer needed, and
 functions copy_context_entry and copy_context_entry_table will be
 reduced to several lines. The patchset will be much simpler.

Yes, please do it this way. There is no point in copying the old
page-tables. We never make changes to them, so they can stay where they
are.

The problem with the backup region still exists, but it is not
specific to IOMMUs. This problem and should be addressed seperatly.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:
 +
 +#ifdef CONFIG_CRASH_DUMP
 +
 +/*
 + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
 + *
 + * Fixes the crashdump kernel to deal with an active iommu and legacy
 + * DMA from the (old) panicked kernel in a manner similar to how legacy
 + * DMA is handled when no hardware iommu was in use by the old kernel --
 + * allow the legacy DMA to continue into its current buffers.
 + *
 + * In the crashdump kernel, this code:
 + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA).
 + * 2. Do not re-enable IOMMU's translating.
 + * 3. In kdump kernel, use the old root entry table.
 + * 4. Leaves the current translations in-place so that legacy DMA will
 + *continue to use its current buffers.
 + * 5. Allocates to the device drivers in the crashdump kernel
 + *portions of the iova address ranges that are different
 + *from the iova address ranges that were being used by the old kernel
 + *at the time of the panic.
 + *
 + */

It looks like you are still copying the io-page-tables from the old
kernel into the kdump kernel, is that right? With the approach that was
proposed you only need to copy over the context entries 1-1. They are
still pointing to the page-tables in the old kernels memory (which is
just fine).

The root-entry of the old kernel is also re-used, and when the kdump
kernel starts to use a device, its context entry is updated to point to
a newly allocated page-table.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal
On Mon, Jan 12, 2015 at 04:22:08PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 03:06:20PM +0800, Li, Zhen-Hua wrote:
  +
  +#ifdef CONFIG_CRASH_DUMP
  +
  +/*
  + * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
  + *
  + * Fixes the crashdump kernel to deal with an active iommu and legacy
  + * DMA from the (old) panicked kernel in a manner similar to how legacy
  + * DMA is handled when no hardware iommu was in use by the old kernel --
  + * allow the legacy DMA to continue into its current buffers.
  + *
  + * In the crashdump kernel, this code:
  + * 1. skips disabling the IOMMU's translating of IO Virtual Addresses 
  (IOVA).
  + * 2. Do not re-enable IOMMU's translating.
  + * 3. In kdump kernel, use the old root entry table.
  + * 4. Leaves the current translations in-place so that legacy DMA will
  + *continue to use its current buffers.
  + * 5. Allocates to the device drivers in the crashdump kernel
  + *portions of the iova address ranges that are different
  + *from the iova address ranges that were being used by the old kernel
  + *at the time of the panic.
  + *
  + */
 
 It looks like you are still copying the io-page-tables from the old
 kernel into the kdump kernel, is that right? With the approach that was
 proposed you only need to copy over the context entries 1-1. They are
 still pointing to the page-tables in the old kernels memory (which is
 just fine).

Kdump has the notion of backup region. Where certain parts of old kernels
memory can be moved to a different location (first 640K on x86 as of now)
and new kernel can make use of this memory now.

So we will have to just make sure that no parts of this old page table
fall into backup region.

Thanks
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Vivek Goyal
On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote:
 On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
  Kdump has the notion of backup region. Where certain parts of old kernels
  memory can be moved to a different location (first 640K on x86 as of now)
  and new kernel can make use of this memory now.
  
  So we will have to just make sure that no parts of this old page table
  fall into backup region.
 
 Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
 yet ;)
 Sadly, your above statement is true for all hardware-accessible data
 structures in IOMMU code. I think about how we can solve this, is there
 an easy way to allocate memory that is not in any backup region?

Hmm..., there does not seem to be any easy way to do this. In fact, as of
now, kernel does not even know where is backup region. All these details are
managed by user space completely (except for new kexec_file_load() syscall).

That means we are left with ugly options now.

- Define per arch kexec backup regions in kernel and export it to user
  space and let kexec-tools make use of that deinition (instead of
  defining its own). That way memory allocation code in kernel can look
  at this backup area and skip it for certain allocations.

Thanks
Vivek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
 Kdump has the notion of backup region. Where certain parts of old kernels
 memory can be moved to a different location (first 640K on x86 as of now)
 and new kernel can make use of this memory now.
 
 So we will have to just make sure that no parts of this old page table
 fall into backup region.

Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
yet ;)
Sadly, your above statement is true for all hardware-accessible data
structures in IOMMU code. I think about how we can solve this, is there
an easy way to allocate memory that is not in any backup region?

Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-12 Thread Joerg Roedel
On Mon, Jan 12, 2015 at 11:15:38AM -0500, Vivek Goyal wrote:
 On Mon, Jan 12, 2015 at 05:06:46PM +0100, Joerg Roedel wrote:
  On Mon, Jan 12, 2015 at 10:29:19AM -0500, Vivek Goyal wrote:
   Kdump has the notion of backup region. Where certain parts of old kernels
   memory can be moved to a different location (first 640K on x86 as of now)
   and new kernel can make use of this memory now.
   
   So we will have to just make sure that no parts of this old page table
   fall into backup region.
  
  Uuh, looks like the 'iommu-with-kdump-issue' isn't complicated enough
  yet ;)
  Sadly, your above statement is true for all hardware-accessible data
  structures in IOMMU code. I think about how we can solve this, is there
  an easy way to allocate memory that is not in any backup region?
 
 Hmm..., there does not seem to be any easy way to do this. In fact, as of
 now, kernel does not even know where is backup region. All these details are
 managed by user space completely (except for new kexec_file_load() syscall).
 
 That means we are left with ugly options now.
 
 - Define per arch kexec backup regions in kernel and export it to user
   space and let kexec-tools make use of that deinition (instead of
   defining its own). That way memory allocation code in kernel can look
   at this backup area and skip it for certain allocations.

Yes, that makes sense. In fact, I think all allocations for DMA memory
need to take this into account to avoid potentially serious data
corruption.
If any memory for a disk superblock gets allocated in backup memory and
a kdump happens, the new kernel might zero out that area and the disk
controler then writes the zeroes to disk instead of the superblock.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 02/10] iommu/vt-d: Items required for kdump

2015-01-11 Thread Li, Zhen-Hua
Add structure type domain_values_entry used for kdump;
Add context entry functions needed for kdump.

Bill Sumner:
Original version;

Li, Zhenhua:
Changed the name of new functions, make them consistent with current
context get/set functions.

Signed-off-by: Bill Sumner billsumnerli...@gmail.com
Signed-off-by: Li, Zhen-Hua zhen-h...@hp.com
---
 drivers/iommu/intel-iommu.c | 70 +
 1 file changed, 70 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8d5c400..a71de3f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -40,6 +40,7 @@
 #include linux/pci-ats.h
 #include linux/memblock.h
 #include linux/dma-contiguous.h
+#include linux/crash_dump.h
 #include asm/irq_remapping.h
 #include asm/cacheflush.h
 #include asm/iommu.h
@@ -208,6 +209,12 @@ get_context_addr_from_root(struct root_entry *root)
NULL);
 }
 
+static inline unsigned long
+get_context_phys_from_root(struct root_entry *root)
+{
+   return  root_present(root) ? (root-val  VTD_PAGE_MASK) : 0;
+}
+
 /*
  * low 64 bits:
  * 0: present
@@ -228,6 +235,32 @@ static inline bool context_present(struct context_entry 
*context)
 {
return (context-lo  1);
 }
+
+static inline int context_fault_enable(struct context_entry *c)
+{
+   return((c-lo  1)  0x1);
+}
+
+static inline int context_translation_type(struct context_entry *c)
+{
+   return((c-lo  2)  0x3);
+}
+
+static inline u64 context_address_root(struct context_entry *c)
+{
+   return((c-lo  VTD_PAGE_SHIFT));
+}
+
+static inline int context_address_width(struct context_entry *c)
+{
+   return((c-hi  0)  0x7);
+}
+
+static inline int context_domain_id(struct context_entry *c)
+{
+   return((c-hi  8)  0x);
+}
+
 static inline void context_set_present(struct context_entry *context)
 {
context-lo |= 1;
@@ -313,6 +346,43 @@ static inline int first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte  ~VTD_PAGE_MASK);
 }
 
+
+#ifdef CONFIG_CRASH_DUMP
+
+/*
+ * Fix Crashdump failure caused by leftover DMA through a hardware IOMMU
+ *
+ * Fixes the crashdump kernel to deal with an active iommu and legacy
+ * DMA from the (old) panicked kernel in a manner similar to how legacy
+ * DMA is handled when no hardware iommu was in use by the old kernel --
+ * allow the legacy DMA to continue into its current buffers.
+ *
+ * In the crashdump kernel, this code:
+ * 1. skips disabling the IOMMU's translating of IO Virtual Addresses (IOVA).
+ * 2. Do not re-enable IOMMU's translating.
+ * 3. In kdump kernel, use the old root entry table.
+ * 4. Leaves the current translations in-place so that legacy DMA will
+ *continue to use its current buffers.
+ * 5. Allocates to the device drivers in the crashdump kernel
+ *portions of the iova address ranges that are different
+ *from the iova address ranges that were being used by the old kernel
+ *at the time of the panic.
+ *
+ */
+
+struct domain_values_entry {
+   struct list_head link;  /* link entries into a list */
+   struct iova_domain iovad;   /* iova's that belong to this domain */
+   struct dma_pte  *pgd;   /* virtual address */
+   intdid; /* domain id */
+   intgaw; /* max guest address width */
+   intiommu_superpage; /* Level of superpages supported:
+  0 == 4KiB (no superpages), 1 == 2MiB,
+  2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
+};
+
+#endif /* CONFIG_CRASH_DUMP */
+
 /*
  * This domain is a statically identity mapping domain.
  * 1. This domain creats a static 1:1 mapping to all usable memory.
-- 
2.0.0-rc0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu