RE: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-25 Thread Sumner, William
  1 Hi Don,
  2 It seems that we agree in the area of keeping the iommu active and 
using it to
  3 isolate the legacy DMA.  We also agree that the device's IO driver 
should be
  4 the software that deals with the device (e.g.: resets it, etc.)
  5
  6 It also seems that there are several differences in our two approaches 
to how
  7 the iommu is then used to accomplish this common goal.
  8
  9 More detailed replies are among your comments below.
 10
 11 On Monday 4/7/2014: Don Dutile wrote:
 12 >On 01/10/2014 05:07 PM, Bill Sumner wrote:
 13 >> v2->v3:
 14 >> 1. Commented-out "#define DEBUG 1" to eliminate debug messages
 15 >> 2. Updated the comments about changes in each version in all patches 
in the set.
 16 >> 3. Fixed: one-line added to Copy-Translations" patch to initialize 
the iovad
 17 >>struct as recommended by Baoquan He [b...@redhat.com]
 18 >>init_iova_domain(>iovad, DMA_32BIT_PFN);
 19 >>
 20 >> v1->v2:
 21 >> The following series implements a fix for:
 22 >> A kdump problem about DMA that has been discussed for a long time. 
That is,
 23 >> when a kernel panics and boots into the kdump kernel, DMA started by 
the
 24 >> panicked kernel is not stopped before the kdump kernel is booted and 
the
 25 >> kdump kernel disables the IOMMU while this DMA continues.  This 
causes the
 26 >> IOMMU to stop translating the DMA addresses as IOVAs and begin to 
treat them
 27 >> as physical memory addresses -- which causes the DMA to either:
 28 >> (1) generate DMAR errors or (2) generate PCI SERR errors or (3) 
transfer
 29 >> data to or from incorrect areas of memory. Often this causes the 
dump to fail.
 30 >>
 31 >> This patch set modifies the behavior of the iommu in the (new) 
crashdump kernel:
 32 >> 1. to accept the iommu hardware in an active state,
 33 >> 2. to leave the current translations in-place so that legacy DMA 
will continue
 34 >> using its current buffers until the device drivers in the 
crashdump kernel
 35 >> initialize and initialize their devices,
 36 >> 3. to use different portions of the iova address ranges for the 
device drivers
 37 >> in the crashdump kernel than the iova ranges that were in-use at 
the time
 38 >> of the panic.
 39 >>
 40 >> Advantages of this approach:
 41 >> 1. All manipulation of the IO-device is done by the Linux 
device-driver
 42 >> for that device.
 43 >> 2. This approach behaves in a manner very similar to operation 
without an
 44 >> active iommu.
 45 >
 46 >Sorry to be late to the game finally getting out of a deep hole &
 47 >asked to look at this proposal...
 48 >
 49 >Along this concept -- similar to operation without an active iommu --
 50 >have you considered the following:
 51 >a) if (this is crash kernel), turn *off* DMAR faults;
 52 I did look at turning off DMAR faults and, although I chose not to 
disable
 53 them in my current patch submissions, I think that this is still an 
open area
 54 to explore -- especially for any cases where DMAR faults were already 
occurring
 55 in the panicked kernel. We may want to explore expanding the patches to 
cover the
 56 case where legacy DMAR faults are still occurring in the kdump kernel.
 57 I would recommend tackling this as a separate follow-on effort.
 58 In testing my patches, I have seen no new DMAR faults introduced by 
them.
 59 In fact, the transition is seamless, leaving no time window where (for
 60 instance) RMRR DMA would fail.
 61
 62 >b) if (this is crash kernel), isolate all device DMA in IOMMU
 63 I agree, the iommu is a great resource to isolate device DMA into safe 
memory
 64 areas -- we should leave it active and use it.
 65
 66 >b) as second kernel configures each device, have each device to use 
IOMMU hw-passthrough,
 67 >  i.e., the equivalent of having no IOMMU for the second, kexec'd 
kernel
 68 >   but, having the benefit of keeping all the other (potentially bad) 
devices
 69 >   sequestered / isolated, until they are initialized & re-configured 
in the second kernel,
 70 >*if at all* -- note: kexec'd kernels may not enable/configure all 
devices that
 71 >existed in the first kernel (Bill: I'm sure you know this, but 
others may not).
 72 >
 73 >RMRR's that were previously setup could continue to work if they are 
skipped in step (b),
 74 >unless the device has gone mad/bad.  In that case, re-parsing the RMRR 
may or may not
 75 >clear up the issue.
 76 >
 77 >Additionally, a tidbit of information like "some servers force NMI's 
on DMAR faults,
 78 >and cause a system reset, thereby, preventing a kdump to occur"
 79 

RE: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-25 Thread Sumner, William
  1 Hi Don,
  2 It seems that we agree in the area of keeping the iommu active and 
using it to
  3 isolate the legacy DMA.  We also agree that the device's IO driver 
should be
  4 the software that deals with the device (e.g.: resets it, etc.)
  5
  6 It also seems that there are several differences in our two approaches 
to how
  7 the iommu is then used to accomplish this common goal.
  8
  9 More detailed replies are among your comments below.
 10
 11 On Monday 4/7/2014: Don Dutile wrote:
 12 On 01/10/2014 05:07 PM, Bill Sumner wrote:
 13  v2-v3:
 14  1. Commented-out #define DEBUG 1 to eliminate debug messages
 15  2. Updated the comments about changes in each version in all patches 
in the set.
 16  3. Fixed: one-line added to Copy-Translations patch to initialize 
the iovad
 17 struct as recommended by Baoquan He [b...@redhat.com]
 18 init_iova_domain(domain-iovad, DMA_32BIT_PFN);
 19 
 20  v1-v2:
 21  The following series implements a fix for:
 22  A kdump problem about DMA that has been discussed for a long time. 
That is,
 23  when a kernel panics and boots into the kdump kernel, DMA started by 
the
 24  panicked kernel is not stopped before the kdump kernel is booted and 
the
 25  kdump kernel disables the IOMMU while this DMA continues.  This 
causes the
 26  IOMMU to stop translating the DMA addresses as IOVAs and begin to 
treat them
 27  as physical memory addresses -- which causes the DMA to either:
 28  (1) generate DMAR errors or (2) generate PCI SERR errors or (3) 
transfer
 29  data to or from incorrect areas of memory. Often this causes the 
dump to fail.
 30 
 31  This patch set modifies the behavior of the iommu in the (new) 
crashdump kernel:
 32  1. to accept the iommu hardware in an active state,
 33  2. to leave the current translations in-place so that legacy DMA 
will continue
 34  using its current buffers until the device drivers in the 
crashdump kernel
 35  initialize and initialize their devices,
 36  3. to use different portions of the iova address ranges for the 
device drivers
 37  in the crashdump kernel than the iova ranges that were in-use at 
the time
 38  of the panic.
 39 
 40  Advantages of this approach:
 41  1. All manipulation of the IO-device is done by the Linux 
device-driver
 42  for that device.
 43  2. This approach behaves in a manner very similar to operation 
without an
 44  active iommu.
 45 
 46 Sorry to be late to the game finally getting out of a deep hole 
 47 asked to look at this proposal...
 48 
 49 Along this concept -- similar to operation without an active iommu --
 50 have you considered the following:
 51 a) if (this is crash kernel), turn *off* DMAR faults;
 52 I did look at turning off DMAR faults and, although I chose not to 
disable
 53 them in my current patch submissions, I think that this is still an 
open area
 54 to explore -- especially for any cases where DMAR faults were already 
occurring
 55 in the panicked kernel. We may want to explore expanding the patches to 
cover the
 56 case where legacy DMAR faults are still occurring in the kdump kernel.
 57 I would recommend tackling this as a separate follow-on effort.
 58 In testing my patches, I have seen no new DMAR faults introduced by 
them.
 59 In fact, the transition is seamless, leaving no time window where (for
 60 instance) RMRR DMA would fail.
 61
 62 b) if (this is crash kernel), isolate all device DMA in IOMMU
 63 I agree, the iommu is a great resource to isolate device DMA into safe 
memory
 64 areas -- we should leave it active and use it.
 65
 66 b) as second kernel configures each device, have each device to use 
IOMMU hw-passthrough,
 67   i.e., the equivalent of having no IOMMU for the second, kexec'd 
kernel
 68but, having the benefit of keeping all the other (potentially bad) 
devices
 69sequestered / isolated, until they are initialized  re-configured 
in the second kernel,
 70 *if at all* -- note: kexec'd kernels may not enable/configure all 
devices that
 71 existed in the first kernel (Bill: I'm sure you know this, but 
others may not).
 72 
 73 RMRR's that were previously setup could continue to work if they are 
skipped in step (b),
 74 unless the device has gone mad/bad.  In that case, re-parsing the RMRR 
may or may not
 75 clear up the issue.
 76 
 77 Additionally, a tidbit of information like some servers force NMI's 
on DMAR faults,
 78 and cause a system reset, thereby, preventing a kdump to occur
 79 should have been included as one reason to stop DMAR faults from 
occurring on kex

Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-08 Thread Don Dutile

On 04/08/2014 12:14 PM, David Woodhouse wrote:

On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:


Additionally, a tidbit of information like "some servers force NMI's
on DMAR faults,
and cause a system reset, thereby, preventing a kdump to occur"
should have been included as one reason to stop DMAR faults from
occurring on kexec-boot,
in addition to the fact that a flood of them can lock up a system.


How about allocating a physical scratch page, and setting up a mapping
for each device such that *every* virtual address (apart from those
listed in RMRRs, perhaps) is mapped to that same scratch page?

That way you avoid the faults, but you also avoid stray DMA to parts of
the system that you don't want to get corrupted.


+1... more isolation as second kernel booting sounds good.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-08 Thread David Woodhouse
On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:
> 
> Additionally, a tidbit of information like "some servers force NMI's
> on DMAR faults,
> and cause a system reset, thereby, preventing a kdump to occur"
> should have been included as one reason to stop DMAR faults from
> occurring on kexec-boot,
> in addition to the fact that a flood of them can lock up a system.

How about allocating a physical scratch page, and setting up a mapping
for each device such that *every* virtual address (apart from those
listed in RMRRs, perhaps) is mapped to that same scratch page?

That way you avoid the faults, but you also avoid stray DMA to parts of
the system that you don't want to get corrupted.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-08 Thread David Woodhouse
On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:
 
 Additionally, a tidbit of information like some servers force NMI's
 on DMAR faults,
 and cause a system reset, thereby, preventing a kdump to occur
 should have been included as one reason to stop DMAR faults from
 occurring on kexec-boot,
 in addition to the fact that a flood of them can lock up a system.

How about allocating a physical scratch page, and setting up a mapping
for each device such that *every* virtual address (apart from those
listed in RMRRs, perhaps) is mapped to that same scratch page?

That way you avoid the faults, but you also avoid stray DMA to parts of
the system that you don't want to get corrupted.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-08 Thread Don Dutile

On 04/08/2014 12:14 PM, David Woodhouse wrote:

On Mon, 2014-04-07 at 16:43 -0400, Don Dutile wrote:


Additionally, a tidbit of information like some servers force NMI's
on DMAR faults,
and cause a system reset, thereby, preventing a kdump to occur
should have been included as one reason to stop DMAR faults from
occurring on kexec-boot,
in addition to the fact that a flood of them can lock up a system.


How about allocating a physical scratch page, and setting up a mapping
for each device such that *every* virtual address (apart from those
listed in RMRRs, perhaps) is mapped to that same scratch page?

That way you avoid the faults, but you also avoid stray DMA to parts of
the system that you don't want to get corrupted.


+1... more isolation as second kernel booting sounds good.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-07 Thread Don Dutile

On 01/10/2014 05:07 PM, Bill Sumner wrote:

v2->v3:
1. Commented-out "#define DEBUG 1" to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
   struct as recommended by Baoquan He [b...@redhat.com]
   init_iova_domain(>iovad, DMA_32BIT_PFN);

v1->v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the
panicked kernel is not stopped before the kdump kernel is booted and the
kdump kernel disables the IOMMU while this DMA continues.  This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
1. to accept the iommu hardware in an active state,
2. to leave the current translations in-place so that legacy DMA will continue
using its current buffers until the device drivers in the crashdump kernel
initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.


Sorry to be late to the game finally getting out of a deep hole &
asked to look at this proposal...

Along this concept -- similar to operation without an active iommu --
have you considered the following:
a) if (this is crash kernel), turn *off* DMAR faults;
b) if (this is crash kernel), isolate all device DMA in IOMMU
b) as second kernel configures each device, have each device to use IOMMU 
hw-passthrough,
 i.e., the equivalent of having no IOMMU for the second, kexec'd kernel
  but, having the benefit of keeping all the other (potentially bad) devices
  sequestered / isolated, until they are initialized & re-configured in the 
second kernel,
   *if at all* -- note: kexec'd kernels may not enable/configure all devices 
that
   existed in the first kernel (Bill: I'm sure you know this, but others may 
not).

RMRR's that were previously setup could continue to work if they are skipped in 
step (b),
unless the device has gone mad/bad.  In that case, re-parsing the RMRR may or 
may not
clear up the issue.

Additionally, a tidbit of information like "some servers force NMI's on DMAR 
faults,
and cause a system reset, thereby, preventing a kdump to occur"
should have been included as one reason to stop DMAR faults from occurring on 
kexec-boot,
in addition to the fact that a flood of them can lock up a system.

Again, just turning off DMAR fault reporting for the 'if (this is crash 
kernel)',
short-term workaround sounds a whole lot less expensive to implement, as well as
'if (this is crash kernel), force hw-passthrough'.

If the IO devices are borked to the point that they won't complete DMA properly,
with or without IOMMU, the system is dead anyhow, game over.

Finally, copying the previous IOMMU state to the second kernel, and hoping
that the cause of the kernel crash wasn't an errant DMA (e.g., due to a device 
going bad,
or it's DMA-state being corrupted & causing an improper IO), is omitting an 
important failure
case/space.
Keeping the first-kernel DMA isolated (IOMMU on, all translations off, all DMAR 
faults off),
and then allowing each device (driver) configuration to sanely reset the device 
&
start a new (hw-passthrough) domain seems simpler and cleaner, for this 
dump-and-run kernel
effort.

- Don


3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the "copy..." functions.
The "process..." functions were primarily used for diagnostics and
exploration; however, there was a small amount of operational code that
used the "process..." functions.
This operational code has been moved into the "copy..." functions.

2. Removed the "Process ..." functions and the diagnostic code that ran
on that function set.  This removed about 1/4 of the code -- which this
operational patch set no longer needs.  These portions of the RFC patch
could be 

Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-04-07 Thread Don Dutile

On 01/10/2014 05:07 PM, Bill Sumner wrote:

v2-v3:
1. Commented-out #define DEBUG 1 to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
   struct as recommended by Baoquan He [b...@redhat.com]
   init_iova_domain(domain-iovad, DMA_32BIT_PFN);

v1-v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the
panicked kernel is not stopped before the kdump kernel is booted and the
kdump kernel disables the IOMMU while this DMA continues.  This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
1. to accept the iommu hardware in an active state,
2. to leave the current translations in-place so that legacy DMA will continue
using its current buffers until the device drivers in the crashdump kernel
initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
for that device.
2. This approach behaves in a manner very similar to operation without an
active iommu.


Sorry to be late to the game finally getting out of a deep hole 
asked to look at this proposal...

Along this concept -- similar to operation without an active iommu --
have you considered the following:
a) if (this is crash kernel), turn *off* DMAR faults;
b) if (this is crash kernel), isolate all device DMA in IOMMU
b) as second kernel configures each device, have each device to use IOMMU 
hw-passthrough,
 i.e., the equivalent of having no IOMMU for the second, kexec'd kernel
  but, having the benefit of keeping all the other (potentially bad) devices
  sequestered / isolated, until they are initialized  re-configured in the 
second kernel,
   *if at all* -- note: kexec'd kernels may not enable/configure all devices 
that
   existed in the first kernel (Bill: I'm sure you know this, but others may 
not).

RMRR's that were previously setup could continue to work if they are skipped in 
step (b),
unless the device has gone mad/bad.  In that case, re-parsing the RMRR may or 
may not
clear up the issue.

Additionally, a tidbit of information like some servers force NMI's on DMAR 
faults,
and cause a system reset, thereby, preventing a kdump to occur
should have been included as one reason to stop DMAR faults from occurring on 
kexec-boot,
in addition to the fact that a flood of them can lock up a system.

Again, just turning off DMAR fault reporting for the 'if (this is crash 
kernel)',
short-term workaround sounds a whole lot less expensive to implement, as well as
'if (this is crash kernel), force hw-passthrough'.

If the IO devices are borked to the point that they won't complete DMA properly,
with or without IOMMU, the system is dead anyhow, game over.

Finally, copying the previous IOMMU state to the second kernel, and hoping
that the cause of the kernel crash wasn't an errant DMA (e.g., due to a device 
going bad,
or it's DMA-state being corrupted  causing an improper IO), is omitting an 
important failure
case/space.
Keeping the first-kernel DMA isolated (IOMMU on, all translations off, all DMAR 
faults off),
and then allowing each device (driver) configuration to sanely reset the device 

start a new (hw-passthrough) domain seems simpler and cleaner, for this 
dump-and-run kernel
effort.

- Don


3. Any activity between the IO-device and its RMRR areas is handled by the
device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
This supports the practice of creating a special kdump kernel without
drivers for any devices that are not required for taking a crashdump.

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the copy... functions.
The process... functions were primarily used for diagnostics and
exploration; however, there was a small amount of operational code that
used the process... functions.
This operational code has been moved into the copy... functions.

2. Removed the Process ... functions and the diagnostic code that ran
on that function set.  This removed about 1/4 of the code -- which this
operational patch set no longer needs.  These portions of the RFC patch
could be formatted as a 

FW: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-12 Thread Sumner, William
Resending as plain-text.
Bill

From: Sumner, William 
Sent: Wednesday, March 12, 2014 11:15 AM
To: 'Vivek Goyal'
Cc: dw...@infradead.org; indou.ta...@jp.fujitsu.com; b...@redhat.com; 
j...@8bytes.org; linux-...@vger.kernel.org; ke...@lists.infradead.org; 
alex.william...@redhat.com; linux-kernel@vger.kernel.org; 
io...@lists.linux-foundation.org; ddut...@redhat.com; Hatch, Douglas B (HPS 
Linux PM); ishii.hiron...@jp.fujitsu.com; bhelg...@google.com; Li, Zhen-Hua
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

On Mon, Mar 10, 2014 at 04:43PM, Vivek Goyal wrote:
>On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
>
>[..]
>> This patch set modifies the behavior of the iommu in the (new) crashdump 
>> kernel:
>> 1. to accept the iommu hardware in an active state, 2. to leave the
>> current translations in-place so that legacy DMA will continue
>>    using its current buffers until the device drivers in the crashdump 
>>kernel>>    initialize and initialize their devices, 3. to use different
>> portions of the iova address ranges for the device drivers
>>    in the crashdump kernel than the iova ranges that were in-use at the time
>>    of the panic.
>
>Conceptually, above makes sense to me. I have few queries.
>
>- Do we need to pass any kind of data from first kernel to second kernel,
>  like table size etc? Calgary IOMMU was using first kernel's tables
>  also and it was determining previous kernel's table size using saved_max_pfn.

Yes. We need to pass the intel-iommu translation tables from the first kernel 
to the second kernel - well, to allow the second kernel to read them.  Only the 
tree of tables that the intel-iommu hardware references are needed: the 
root-entry table, the context-entry tables, and the page-translation tables.  
This patch involves only the intel-iommu -- and none of the other iommu 
hardware types.

During the early initialization of the new kdump kernel this patch copies 
(duplicates) the tree of iommu translation tables from the old kernel into the 
new kernel.  These tables are all linked together as a tree by physical memory 
addresses. The physical address of the top of the tree -- the root-entry table 
-- is obtained from a register in the iommu hardware. The copy operation 
traverses the tree in depth-first order, sanity-checking each table and then 
duplicating it into the kernel address space of the new kernel, linking the new 
tables appropriately.  If the copy succeeds then the iommu register is updated 
to point to the copy in the new kernel and the iommu continues translating DMA 
requests from IO devices.  If a sanity-check fails, the patch currently 
errors-out of the dump (might want to revisit this error case and see if there 
is something better to do.)

By copying the tables into the new kernel, all of the existing code in 
intel-iommu.c continues to work nearly unchanged, with only a few added 
initializations of fields when kdump is active and when intel-iommu.c creates 
its "bookkeeping" structures for the device -- to make sure they correspond to 
the contents of the translate tables. I made a determined effort to avoid 
changing the existing execution path for the non-kdump case, and to minimize 
the changes even when kdump is active.  

Note also that this leaves the copy of the translate tables in the old panicked 
kernel unused and unchanged during the operation of 'makedumpfile', so they 
appear correctly in the dump as they were at the time of the panic.

>
>- I don't know how IOMMU translation tables look like, but are new DMA
>  zones setup by drivers in second kernel part of same table?

Yes. The copied translation tables in the kdump kernel contain both the 
translations that were active at the time of the dump (which will never go away 
until the dump is finished and the platform is rebooted), plus any translations 
needed by the kdump kernel (which come and go as necessary).  As the 
"bookkeeping" structures are created in the new kernel (typically the first 
time that a device requests a translation for a buffer) they are initialized 
such that all IOVAs that already exist in the translate tables for tht device 
are marked as "not available for allocation" so all requests by device drivers 
in the new kernel receive IOVAs that were not in-use at the time of the panic.

> How do we
>  make sure that sufficient space is available.

So far I have not seen any problem with running out of space because of this 
copy;  however, I believe that this may be a valid concern with very large IO 
configurations -- and it deserves some attention as the community reviews and 
tests this patch.

> I am not sure if possible
>  table corruption from crashed kernel is an issue here or not.

Any corrupted translation tables from the crashed kernel would be a problem 
that would prevent using copies o

FW: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-12 Thread Sumner, William
Resending as plain-text.
Bill

From: Sumner, William 
Sent: Wednesday, March 12, 2014 11:15 AM
To: 'Vivek Goyal'
Cc: dw...@infradead.org; indou.ta...@jp.fujitsu.com; b...@redhat.com; 
j...@8bytes.org; linux-...@vger.kernel.org; ke...@lists.infradead.org; 
alex.william...@redhat.com; linux-kernel@vger.kernel.org; 
io...@lists.linux-foundation.org; ddut...@redhat.com; Hatch, Douglas B (HPS 
Linux PM); ishii.hiron...@jp.fujitsu.com; bhelg...@google.com; Li, Zhen-Hua
Subject: Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

On Mon, Mar 10, 2014 at 04:43PM, Vivek Goyal wrote:
On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:

[..]
 This patch set modifies the behavior of the iommu in the (new) crashdump 
 kernel:
 1. to accept the iommu hardware in an active state, 2. to leave the
 current translations in-place so that legacy DMA will continue
    using its current buffers until the device drivers in the crashdump 
kernel    initialize and initialize their devices, 3. to use different
 portions of the iova address ranges for the device drivers
    in the crashdump kernel than the iova ranges that were in-use at the time
    of the panic.

Conceptually, above makes sense to me. I have few queries.

- Do we need to pass any kind of data from first kernel to second kernel,
  like table size etc? Calgary IOMMU was using first kernel's tables
  also and it was determining previous kernel's table size using saved_max_pfn.

Yes. We need to pass the intel-iommu translation tables from the first kernel 
to the second kernel - well, to allow the second kernel to read them.  Only the 
tree of tables that the intel-iommu hardware references are needed: the 
root-entry table, the context-entry tables, and the page-translation tables.  
This patch involves only the intel-iommu -- and none of the other iommu 
hardware types.

During the early initialization of the new kdump kernel this patch copies 
(duplicates) the tree of iommu translation tables from the old kernel into the 
new kernel.  These tables are all linked together as a tree by physical memory 
addresses. The physical address of the top of the tree -- the root-entry table 
-- is obtained from a register in the iommu hardware. The copy operation 
traverses the tree in depth-first order, sanity-checking each table and then 
duplicating it into the kernel address space of the new kernel, linking the new 
tables appropriately.  If the copy succeeds then the iommu register is updated 
to point to the copy in the new kernel and the iommu continues translating DMA 
requests from IO devices.  If a sanity-check fails, the patch currently 
errors-out of the dump (might want to revisit this error case and see if there 
is something better to do.)

By copying the tables into the new kernel, all of the existing code in 
intel-iommu.c continues to work nearly unchanged, with only a few added 
initializations of fields when kdump is active and when intel-iommu.c creates 
its bookkeeping structures for the device -- to make sure they correspond to 
the contents of the translate tables. I made a determined effort to avoid 
changing the existing execution path for the non-kdump case, and to minimize 
the changes even when kdump is active.  

Note also that this leaves the copy of the translate tables in the old panicked 
kernel unused and unchanged during the operation of 'makedumpfile', so they 
appear correctly in the dump as they were at the time of the panic.


- I don't know how IOMMU translation tables look like, but are new DMA
  zones setup by drivers in second kernel part of same table?

Yes. The copied translation tables in the kdump kernel contain both the 
translations that were active at the time of the dump (which will never go away 
until the dump is finished and the platform is rebooted), plus any translations 
needed by the kdump kernel (which come and go as necessary).  As the 
bookkeeping structures are created in the new kernel (typically the first 
time that a device requests a translation for a buffer) they are initialized 
such that all IOVAs that already exist in the translate tables for tht device 
are marked as not available for allocation so all requests by device drivers 
in the new kernel receive IOVAs that were not in-use at the time of the panic.

 How do we
  make sure that sufficient space is available.

So far I have not seen any problem with running out of space because of this 
copy;  however, I believe that this may be a valid concern with very large IO 
configurations -- and it deserves some attention as the community reviews and 
tests this patch.

 I am not sure if possible
  table corruption from crashed kernel is an issue here or not.

Any corrupted translation tables from the crashed kernel would be a problem 
that would prevent using copies of them. I hope and expect that this happens 
rarely -- and that we would catch it during the copy when it does.

Fortunately, these tables are only manipulated by the code in intel

Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-10 Thread Vivek Goyal
On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:

[..]
> This patch set modifies the behavior of the iommu in the (new) crashdump 
> kernel: 
> 1. to accept the iommu hardware in an active state, 
> 2. to leave the current translations in-place so that legacy DMA will continue
>using its current buffers until the device drivers in the crashdump kernel
>initialize and initialize their devices,
> 3. to use different portions of the iova address ranges for the device drivers
>in the crashdump kernel than the iova ranges that were in-use at the time
>of the panic.  

Conceptually, above makes sense to me. I have few queries.

- Do we need to pass any kind of data from first kernel to second kernel, 
  like table size etc? Calgary IOMMU was using first kernel's tables
  also and it was determining previous kernel's table size using saved_max_pfn.

- I don't know how IOMMU translation tables look like, but are new DMA
  zones setup by drivers in second kernel part of same table? How do we
  make sure that sufficient space is available. I am not sure if possible
  table corruption from crashed kernel is an issue here or not.

In general, I think changelogs of these patches need to be little better.
There seem to be lot of text and still I can't quickly wrap my head around
what a particular patch is supposed to be doing.

But we definitely need to fix this issue. IOMMU issues with kdump have
been troubling us for a very long time.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-10 Thread Sumner, William
Hi Joerg,

On Tue 3/4/2014 9:06 AM, Joerg Roedel wrote:
>
>On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
>> Bill Sumner (6):
>>   Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
>>   Crashdump-Accepting-Active-IOMMU-Utility-functions
>>   Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
>>   Crashdump-Accepting-Active-IOMMU-Copy-Translations
>>   Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
>>   Crashdump-Accepting-Active-IOMMU-Call-From-Mainline
>>
>>  drivers/iommu/intel-iommu.c | 1293
>> ---
>>  1 file changed, 1225 insertions(+), 68 deletions(-)
>
>This is a pretty big change, before merging I would like to get more eyes on 
>it. Please make sure you get more reviews on the code and the general concept. 
>A few things I noticed while looking at it:
Yes, as I was developing this, I realized that it is far more code than most 
Linux submissions.  In addition to more eyes reviewing the code, I hope to see 
additional members of the community trying it out -- testing it on a wide 
variety of system configurations.  Let me encourage additional reviewers and 
testers to look at this fix to crashdump.
>
>* You put a lot of code into #define CONFIG_CRASH_DUMP. Isn't the same 
>approach necessary to for KEXEC?
Until I saw your email, I had been focused only on the immediate problem of 
fixing crashdump and had not considered using this approach in the more-general 
context of KEXEC.  It certainly looks like it could be useful with KEXEC as 
well.  

I would like to take-on the effort of expanding this technique into KEXEC as a 
separate follow-on project so that the original "fix crashdump" project is not 
delayed.  On a quick look, I believe that the amount of additional code would 
be minimal; but the expansion of the testing matrix might be quite large.
>
>* Please get rid of all the pr_debug stuff.
Will do.
>
>* I think it makes sense to put the functions to access the IOMMU
>  configuration values of the old kernel into a new file. This is better
>  than adding over 1000 new lines to intel-iommu.c
Sounds good.  I will move these functions into a new file.
>
>* I have seen your new single-patch for this change. A single patch with
>  more than 1200 changed lines is not something anyone is willing to
>  review. Please split the patches again in a meaningful and bisectable
>  way.
I will return to a multiple-patch set for future submissions.


-- Bill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-10 Thread Sumner, William
Hi Joerg,

On Tue 3/4/2014 9:06 AM, Joerg Roedel wrote:

On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
 Bill Sumner (6):
   Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
   Crashdump-Accepting-Active-IOMMU-Utility-functions
   Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
   Crashdump-Accepting-Active-IOMMU-Copy-Translations
   Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
   Crashdump-Accepting-Active-IOMMU-Call-From-Mainline

  drivers/iommu/intel-iommu.c | 1293
 ---
  1 file changed, 1225 insertions(+), 68 deletions(-)

This is a pretty big change, before merging I would like to get more eyes on 
it. Please make sure you get more reviews on the code and the general concept. 
A few things I noticed while looking at it:
Yes, as I was developing this, I realized that it is far more code than most 
Linux submissions.  In addition to more eyes reviewing the code, I hope to see 
additional members of the community trying it out -- testing it on a wide 
variety of system configurations.  Let me encourage additional reviewers and 
testers to look at this fix to crashdump.

* You put a lot of code into #define CONFIG_CRASH_DUMP. Isn't the same 
approach necessary to for KEXEC?
Until I saw your email, I had been focused only on the immediate problem of 
fixing crashdump and had not considered using this approach in the more-general 
context of KEXEC.  It certainly looks like it could be useful with KEXEC as 
well.  

I would like to take-on the effort of expanding this technique into KEXEC as a 
separate follow-on project so that the original fix crashdump project is not 
delayed.  On a quick look, I believe that the amount of additional code would 
be minimal; but the expansion of the testing matrix might be quite large.

* Please get rid of all the pr_debug stuff.
Will do.

* I think it makes sense to put the functions to access the IOMMU
  configuration values of the old kernel into a new file. This is better
  than adding over 1000 new lines to intel-iommu.c
Sounds good.  I will move these functions into a new file.

* I have seen your new single-patch for this change. A single patch with
  more than 1200 changed lines is not something anyone is willing to
  review. Please split the patches again in a meaningful and bisectable
  way.
I will return to a multiple-patch set for future submissions.


-- Bill
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-10 Thread Vivek Goyal
On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:

[..]
 This patch set modifies the behavior of the iommu in the (new) crashdump 
 kernel: 
 1. to accept the iommu hardware in an active state, 
 2. to leave the current translations in-place so that legacy DMA will continue
using its current buffers until the device drivers in the crashdump kernel
initialize and initialize their devices,
 3. to use different portions of the iova address ranges for the device drivers
in the crashdump kernel than the iova ranges that were in-use at the time
of the panic.  

Conceptually, above makes sense to me. I have few queries.

- Do we need to pass any kind of data from first kernel to second kernel, 
  like table size etc? Calgary IOMMU was using first kernel's tables
  also and it was determining previous kernel's table size using saved_max_pfn.

- I don't know how IOMMU translation tables look like, but are new DMA
  zones setup by drivers in second kernel part of same table? How do we
  make sure that sufficient space is available. I am not sure if possible
  table corruption from crashed kernel is an issue here or not.

In general, I think changelogs of these patches need to be little better.
There seem to be lot of text and still I can't quickly wrap my head around
what a particular patch is supposed to be doing.

But we definitely need to fix this issue. IOMMU issues with kdump have
been troubling us for a very long time.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-04 Thread Joerg Roedel
Hi Bill,

On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
> Bill Sumner (6):
>   Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
>   Crashdump-Accepting-Active-IOMMU-Utility-functions
>   Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
>   Crashdump-Accepting-Active-IOMMU-Copy-Translations
>   Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
>   Crashdump-Accepting-Active-IOMMU-Call-From-Mainline
> 
>  drivers/iommu/intel-iommu.c | 1293 
> ---
>  1 file changed, 1225 insertions(+), 68 deletions(-)

This is a pretty big change, before merging I would like to get more
eyes on it. Please make sure you get more reviews on the code and the
general concept. A few things I noticed while looking at it:

* You put a lot of code into #define CONFIG_CRASH_DUMP. Isn't the same
  approach necessary to for KEXEC?

* Please get rid of all the pr_debug stuff.

* I think it makes sense to put the functions to access the IOMMU
  configuration values of the old kernel into a new file. This is better
  than adding over 1000 new lines to intel-iommu.c

* I have seen your new single-patch for this change. A single patch with
  more than 1200 changed lines is not something anyone is willing to
  review. Please split the patches again in a meaningful and bisectable
  way.


Thanks,

Joerg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-03-04 Thread Joerg Roedel
Hi Bill,

On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
 Bill Sumner (6):
   Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
   Crashdump-Accepting-Active-IOMMU-Utility-functions
   Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
   Crashdump-Accepting-Active-IOMMU-Copy-Translations
   Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
   Crashdump-Accepting-Active-IOMMU-Call-From-Mainline
 
  drivers/iommu/intel-iommu.c | 1293 
 ---
  1 file changed, 1225 insertions(+), 68 deletions(-)

This is a pretty big change, before merging I would like to get more
eyes on it. Please make sure you get more reviews on the code and the
general concept. A few things I noticed while looking at it:

* You put a lot of code into #define CONFIG_CRASH_DUMP. Isn't the same
  approach necessary to for KEXEC?

* Please get rid of all the pr_debug stuff.

* I think it makes sense to put the functions to access the IOMMU
  configuration values of the old kernel into a new file. This is better
  than adding over 1000 new lines to intel-iommu.c

* I have seen your new single-patch for this change. A single patch with
  more than 1200 changed lines is not something anyone is willing to
  review. Please split the patches again in a meaningful and bisectable
  way.


Thanks,

Joerg


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-01-24 Thread Baoquan He
Tested this patchset on my local HP Z420 workstation, and it works very
well.

Hi Bill,

Thanks for your effort.

There are several concerns from me.

Firstly, I think the patch log need be rearanged. Patchset cover letter
can contain information to express why, how briefly. If you think this
is very useful, it can be split and put into patch log.

Then for each patch, patch log should be accurate and summary to
describe why and how this patch really does. If you feel several patches
have the corelation, they may need to be merged.

Secondly, each patch could get a seperate subject which tells what this
patch really does. Even they are merged to kernel git tree, each of them
is a independent commit. People can take to use or depend only one of
them. Actually, I don't like current patch subject.

Thirdly, this patchset will be part of intel-iommu, though they only
works for kdump kernel. As a subsystem, the style need be consistent. I
like the debug method which introduces a struct pr_debug, however
maintainers may not like it. Because a debug utility may bloat code and
affect people's review. Personally I like refined code, the less the
easier to review. Or put it as a independent patch at the end of the
patchset, let maintainer decide whether it's OK to pull in.

Sorry to say so much, I think this solution is truly the right way. As
you know, it's a big problem for kdump when intel-iommu is active in 1st
kernel. Because of this bug, many machines with intel-iommu have to be
set intel-iommu=off, the performance is affected very much.

Baoquan
Thanks

On 01/10/14 at 03:07pm, Bill Sumner wrote:
> v2->v3:
> 1. Commented-out "#define DEBUG 1" to eliminate debug messages
> 2. Updated the comments about changes in each version in all patches in the 
> set.
> 3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
>   struct as recommended by Baoquan He [b...@redhat.com]
>   init_iova_domain(>iovad, DMA_32BIT_PFN);
> 
> v1->v2:
> The following series implements a fix for:
> A kdump problem about DMA that has been discussed for a long time. That is,
> when a kernel panics and boots into the kdump kernel, DMA started by the 
> panicked kernel is not stopped before the kdump kernel is booted and the 
> kdump kernel disables the IOMMU while this DMA continues.  This causes the
> IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them 
> as physical memory addresses -- which causes the DMA to either:
> (1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer  
> data to or from incorrect areas of memory. Often this causes the dump to fail.
> 
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-01-24 Thread Baoquan He
Tested this patchset on my local HP Z420 workstation, and it works very
well.

Hi Bill,

Thanks for your effort.

There are several concerns from me.

Firstly, I think the patch log need be rearanged. Patchset cover letter
can contain information to express why, how briefly. If you think this
is very useful, it can be split and put into patch log.

Then for each patch, patch log should be accurate and summary to
describe why and how this patch really does. If you feel several patches
have the corelation, they may need to be merged.

Secondly, each patch could get a seperate subject which tells what this
patch really does. Even they are merged to kernel git tree, each of them
is a independent commit. People can take to use or depend only one of
them. Actually, I don't like current patch subject.

Thirdly, this patchset will be part of intel-iommu, though they only
works for kdump kernel. As a subsystem, the style need be consistent. I
like the debug method which introduces a struct pr_debug, however
maintainers may not like it. Because a debug utility may bloat code and
affect people's review. Personally I like refined code, the less the
easier to review. Or put it as a independent patch at the end of the
patchset, let maintainer decide whether it's OK to pull in.

Sorry to say so much, I think this solution is truly the right way. As
you know, it's a big problem for kdump when intel-iommu is active in 1st
kernel. Because of this bug, many machines with intel-iommu have to be
set intel-iommu=off, the performance is affected very much.

Baoquan
Thanks

On 01/10/14 at 03:07pm, Bill Sumner wrote:
 v2-v3:
 1. Commented-out #define DEBUG 1 to eliminate debug messages
 2. Updated the comments about changes in each version in all patches in the 
 set.
 3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
   struct as recommended by Baoquan He [b...@redhat.com]
   init_iova_domain(domain-iovad, DMA_32BIT_PFN);
 
 v1-v2:
 The following series implements a fix for:
 A kdump problem about DMA that has been discussed for a long time. That is,
 when a kernel panics and boots into the kdump kernel, DMA started by the 
 panicked kernel is not stopped before the kdump kernel is booted and the 
 kdump kernel disables the IOMMU while this DMA continues.  This causes the
 IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them 
 as physical memory addresses -- which causes the DMA to either:
 (1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer  
 data to or from incorrect areas of memory. Often this causes the dump to fail.
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-01-10 Thread Bill Sumner
v2->v3:
1. Commented-out "#define DEBUG 1" to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
  struct as recommended by Baoquan He [b...@redhat.com]
  init_iova_domain(>iovad, DMA_32BIT_PFN);

v1->v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the 
panicked kernel is not stopped before the kdump kernel is booted and the 
kdump kernel disables the IOMMU while this DMA continues.  This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them 
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer  
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump 
kernel: 
1. to accept the iommu hardware in an active state, 
2. to leave the current translations in-place so that legacy DMA will continue
   using its current buffers until the device drivers in the crashdump kernel
   initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
   in the crashdump kernel than the iova ranges that were in-use at the time
   of the panic.  

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
   for that device.
2. This approach behaves in a manner very similar to operation without an
   active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
   device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
   This supports the practice of creating a special kdump kernel without
   drivers for any devices that are not required for taking a crashdump. 

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the "copy..." functions.
   The "process..." functions were primarily used for diagnostics and
   exploration; however, there was a small amount of operational code that
   used the "process..." functions.
   This operational code has been moved into the "copy..." functions.

2. Removed the "Process ..." functions and the diagnostic code that ran
   on that function set.  This removed about 1/4 of the code -- which this
   operational patch set no longer needs.  These portions of the RFC patch
   could be formatted as a separate patch and submitted independently
   at a later date. 

3. Re-formatted the code to the Linux Coding Standards.
   The checkpatch script still finds some lines to complain about;
   however most of these lines are either (1) lines that I did not change,
   or (2) lines that only changed by adding a level of indent which pushed
   them over 80-characters, or (3) new lines whose intent is far clearer when
   longer than 80-characters.

4. Updated the remaining debug print to be significantly more flexible.
   This allows control over the amount of debug print to the console --
   which can vary widely.

5. Fixed a couple of minor bugs found by testing on a machine with a
   very large IO configuration.

At a high level, this code operates primarily during iommu initialization
and device-driver initialization

During intel-iommu hardware initialization:
In intel_iommu_init(void)
* If (This is the crash kernel)
  .  Set flag: crashdump_accepting_active_iommu (all changes below check this)
  .  Skip disabling the iommu hardware translations

In init_dmars()
* Duplicate the intel iommu translation tables from the old kernel
  in the new kernel
  . The root-entry table, all context-entry tables,
and all page-translation-entry tables
  . The duplicate tables contain updated physical addresses to link them 
together.
  . The duplicate tables are mapped into kernel virtual addresses
in the new kernel which allows most of the existing iommu code
to operate without change.
  . Do some minimal sanity-checks during the copy
  . Place the address of the new root-entry structure into "struct intel_iommu"

* Skip setting-up new domains for 'si', 'rmrr', 'isa' 
  . Translations for 'rmrr' and 'isa' ranges have been copied from the old 
kernel
  . This patch has not yet been tested with iommu pass-through enabled

* Existing (unchanged) code near the end of dmar_init:
  . Loads the address of the (now new) root-entry structure from
"struct intel_iommu" into the iommu hardware and does the hardware flushes.
This changes the active translation tables from the ones in the old kernel
to the copies in the new kernel.
  . This is legal because the translations in the two sets of tables are
currently identical:
  Virtualization Technology 

[PATCHv3 0/6] Crashdump Accepting Active IOMMU

2014-01-10 Thread Bill Sumner
v2-v3:
1. Commented-out #define DEBUG 1 to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations patch to initialize the iovad
  struct as recommended by Baoquan He [b...@redhat.com]
  init_iova_domain(domain-iovad, DMA_32BIT_PFN);

v1-v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the 
panicked kernel is not stopped before the kdump kernel is booted and the 
kdump kernel disables the IOMMU while this DMA continues.  This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them 
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer  
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump 
kernel: 
1. to accept the iommu hardware in an active state, 
2. to leave the current translations in-place so that legacy DMA will continue
   using its current buffers until the device drivers in the crashdump kernel
   initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
   in the crashdump kernel than the iova ranges that were in-use at the time
   of the panic.  

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
   for that device.
2. This approach behaves in a manner very similar to operation without an
   active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
   device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
   This supports the practice of creating a special kdump kernel without
   drivers for any devices that are not required for taking a crashdump. 

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the copy... functions.
   The process... functions were primarily used for diagnostics and
   exploration; however, there was a small amount of operational code that
   used the process... functions.
   This operational code has been moved into the copy... functions.

2. Removed the Process ... functions and the diagnostic code that ran
   on that function set.  This removed about 1/4 of the code -- which this
   operational patch set no longer needs.  These portions of the RFC patch
   could be formatted as a separate patch and submitted independently
   at a later date. 

3. Re-formatted the code to the Linux Coding Standards.
   The checkpatch script still finds some lines to complain about;
   however most of these lines are either (1) lines that I did not change,
   or (2) lines that only changed by adding a level of indent which pushed
   them over 80-characters, or (3) new lines whose intent is far clearer when
   longer than 80-characters.

4. Updated the remaining debug print to be significantly more flexible.
   This allows control over the amount of debug print to the console --
   which can vary widely.

5. Fixed a couple of minor bugs found by testing on a machine with a
   very large IO configuration.

At a high level, this code operates primarily during iommu initialization
and device-driver initialization

During intel-iommu hardware initialization:
In intel_iommu_init(void)
* If (This is the crash kernel)
  .  Set flag: crashdump_accepting_active_iommu (all changes below check this)
  .  Skip disabling the iommu hardware translations

In init_dmars()
* Duplicate the intel iommu translation tables from the old kernel
  in the new kernel
  . The root-entry table, all context-entry tables,
and all page-translation-entry tables
  . The duplicate tables contain updated physical addresses to link them 
together.
  . The duplicate tables are mapped into kernel virtual addresses
in the new kernel which allows most of the existing iommu code
to operate without change.
  . Do some minimal sanity-checks during the copy
  . Place the address of the new root-entry structure into struct intel_iommu

* Skip setting-up new domains for 'si', 'rmrr', 'isa' 
  . Translations for 'rmrr' and 'isa' ranges have been copied from the old 
kernel
  . This patch has not yet been tested with iommu pass-through enabled

* Existing (unchanged) code near the end of dmar_init:
  . Loads the address of the (now new) root-entry structure from
struct intel_iommu into the iommu hardware and does the hardware flushes.
This changes the active translation tables from the ones in the old kernel
to the copies in the new kernel.
  . This is legal because the translations in the two sets of tables are
currently identical:
  Virtualization Technology for Directed