Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-18 Thread David Woodhouse
On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote:
> (2013/09/09 18:07), David Woodhouse wrote:
> > If the driver is so broken that it cannot get the device working again
> > after a fault, surely the driver needs to be fixed?
> 
> Yes,this problem may be solved by fixing driver. Actually megaraid sas
> driver is recently fixed for this problem. (See commit 6431f5d7)
> 
> But I think root cause of this problem is initializing IOMMU while DMA
> is still working, and I want to solve the root cause rather than
> handling it in each driver, otherwise we have to fix driver each time we
> find this kind of problem.

But if the driver is broken and cannot actually recover from hardware
issues, the driver needs to be fixed *anyway*. We shouldn't be papering
over the problem.

> > For the IOMMU code to reset individual devices, just because they still
> > have an active DMA mapping even if they're not *doing* DMA, seems wrong.
>  
> Right, current code is resetting devices which *may* be doing DMA. The
> ideal way is finding devices which are actually doing DMA and reset only
> them but I don't know how we can do this, though I think current code
> is sufficient.

No, that's not the ideal way either. Their DMA will be blocked, and
they'll stop (or at least we'll stop getting an interrupt and reporting
their DMA faults, if the hardware *is* so broken that it keeps trying
over and over again). The new driver will come up and reset the device,
and all will be well.

Do not paper over driver bugs. You are just *encouraging* brokenness.

We need to fix the 'fault storm' issue, by setting the FPD bit in the
context-entry for offending devices when appropriate, and then clearing
it again when appropriate too. But for the IOMMU code to go out and
trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG.

Do Not Do This.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-18 Thread David Woodhouse
On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote:
 (2013/09/09 18:07), David Woodhouse wrote:
  If the driver is so broken that it cannot get the device working again
  after a fault, surely the driver needs to be fixed?
 
 Yes,this problem may be solved by fixing driver. Actually megaraid sas
 driver is recently fixed for this problem. (See commit 6431f5d7)
 
 But I think root cause of this problem is initializing IOMMU while DMA
 is still working, and I want to solve the root cause rather than
 handling it in each driver, otherwise we have to fix driver each time we
 find this kind of problem.

But if the driver is broken and cannot actually recover from hardware
issues, the driver needs to be fixed *anyway*. We shouldn't be papering
over the problem.

  For the IOMMU code to reset individual devices, just because they still
  have an active DMA mapping even if they're not *doing* DMA, seems wrong.
  
 Right, current code is resetting devices which *may* be doing DMA. The
 ideal way is finding devices which are actually doing DMA and reset only
 them but I don't know how we can do this, though I think current code
 is sufficient.

No, that's not the ideal way either. Their DMA will be blocked, and
they'll stop (or at least we'll stop getting an interrupt and reporting
their DMA faults, if the hardware *is* so broken that it keeps trying
over and over again). The new driver will come up and reset the device,
and all will be well.

Do not paper over driver bugs. You are just *encouraging* brokenness.

We need to fix the 'fault storm' issue, by setting the FPD bit in the
context-entry for offending devices when appropriate, and then clearing
it again when appropriate too. But for the IOMMU code to go out and
trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG.

Do Not Do This.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-09 Thread Takao Indoh
(2013/09/09 18:07), David Woodhouse wrote:
> On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote:
>>
>> This causes problem on kdump. Devices are working in first kernel, and
>> after switching to second kernel and initializing IOMMU, many DMAR faults
>> occur and it causes problems like driver error or PCI SERR, at last
>> kdump fails. This patch fixes this problem.
> 
> I'm not sure I'd call this a fix.
> 
> If the driver is so broken that it cannot get the device working again
> after a fault, surely the driver needs to be fixed?

Yes,this problem may be solved by fixing driver. Actually megaraid sas
driver is recently fixed for this problem. (See commit 6431f5d7)

But I think root cause of this problem is initializing IOMMU while DMA
is still working, and I want to solve the root cause rather than
handling it in each driver, otherwise we have to fix driver each time we
find this kind of problem.

> 
> If the system is suffering an IRQ storm because device doesn't give up
> after the first few faults, then we should switch off the fault
> *reporting* for that device so that its faults get ignored (until it
> next actually sets up a DMA mapping, or something).

In such a case, yeah limiting messages is enough.

> 
> For the IOMMU code to reset individual devices, just because they still
> have an active DMA mapping even if they're not *doing* DMA, seems wrong.
> You'll even end up resetting devices just because they have an RMRR,
> won't you? (Although I wouldn't lose any sleep over that, I suppose. In
> fact it might be a *feature*... :)
 
Right, current code is resetting devices which *may* be doing DMA. The
ideal way is finding devices which are actually doing DMA and reset only
them but I don't know how we can do this, though I think current code
is sufficient.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-09 Thread David Woodhouse
On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote:
> 
> This causes problem on kdump. Devices are working in first kernel, and
> after switching to second kernel and initializing IOMMU, many DMAR faults
> occur and it causes problems like driver error or PCI SERR, at last
> kdump fails. This patch fixes this problem.

I'm not sure I'd call this a fix.

If the driver is so broken that it cannot get the device working again
after a fault, surely the driver needs to be fixed?

If the system is suffering an IRQ storm because device doesn't give up
after the first few faults, then we should switch off the fault
*reporting* for that device so that its faults get ignored (until it
next actually sets up a DMA mapping, or something).

For the IOMMU code to reset individual devices, just because they still
have an active DMA mapping even if they're not *doing* DMA, seems wrong.
You'll even end up resetting devices just because they have an RMRR,
won't you? (Although I wouldn't lose any sleep over that, I suppose. In
fact it might be a *feature*... :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-09 Thread David Woodhouse
On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote:
 
 This causes problem on kdump. Devices are working in first kernel, and
 after switching to second kernel and initializing IOMMU, many DMAR faults
 occur and it causes problems like driver error or PCI SERR, at last
 kdump fails. This patch fixes this problem.

I'm not sure I'd call this a fix.

If the driver is so broken that it cannot get the device working again
after a fault, surely the driver needs to be fixed?

If the system is suffering an IRQ storm because device doesn't give up
after the first few faults, then we should switch off the fault
*reporting* for that device so that its faults get ignored (until it
next actually sets up a DMA mapping, or something).

For the IOMMU code to reset individual devices, just because they still
have an active DMA mapping even if they're not *doing* DMA, seems wrong.
You'll even end up resetting devices just because they have an RMRR,
won't you? (Although I wouldn't lose any sleep over that, I suppose. In
fact it might be a *feature*... :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-09 Thread Takao Indoh
(2013/09/09 18:07), David Woodhouse wrote:
 On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote:

 This causes problem on kdump. Devices are working in first kernel, and
 after switching to second kernel and initializing IOMMU, many DMAR faults
 occur and it causes problems like driver error or PCI SERR, at last
 kdump fails. This patch fixes this problem.
 
 I'm not sure I'd call this a fix.
 
 If the driver is so broken that it cannot get the device working again
 after a fault, surely the driver needs to be fixed?

Yes,this problem may be solved by fixing driver. Actually megaraid sas
driver is recently fixed for this problem. (See commit 6431f5d7)

But I think root cause of this problem is initializing IOMMU while DMA
is still working, and I want to solve the root cause rather than
handling it in each driver, otherwise we have to fix driver each time we
find this kind of problem.

 
 If the system is suffering an IRQ storm because device doesn't give up
 after the first few faults, then we should switch off the fault
 *reporting* for that device so that its faults get ignored (until it
 next actually sets up a DMA mapping, or something).

In such a case, yeah limiting messages is enough.

 
 For the IOMMU code to reset individual devices, just because they still
 have an active DMA mapping even if they're not *doing* DMA, seems wrong.
 You'll even end up resetting devices just because they have an RMRR,
 won't you? (Although I wouldn't lose any sleep over that, I suppose. In
 fact it might be a *feature*... :)
 
Right, current code is resetting devices which *may* be doing DMA. The
ideal way is finding devices which are actually doing DMA and reset only
them but I don't know how we can do this, though I think current code
is sufficient.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Takao Indoh
(2013/09/09 13:28), Takao Indoh wrote:
>> BTW, what's the status of Alex's PCI patchset which this patch depends
>> on?
> 
> It is merged to Bjorn's tree, will be in v3.12.

This was wrong. Alex's patchset is already in Linus tree, v3.11.

Thanks,
Takao Indoh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Takao Indoh
(2013/09/08 20:47), Baoquan He wrote:
> Hi,
> Patch is great and works on my HP-z420.

Thank you for your test!

> There are several small concerns, please see inline comments.
> 
> On 08/21/13 at 04:15pm, Takao Indoh wrote:
>> This patch quiesces devices before disabling IOMMU on boot to stop
>> ongoing DMA. In intel_iommu_init(), check context entries and if there
>> is entry whose present bit is set then reset corresponding device.
>>
>> When IOMMU is already enabled on boot, it is disabled and new DMAR table
>> is created and then re-enabled in intel_iommu_init(). This causes DMAR
>> faults if there are in-flight DMAs.
>>
>> This causes problem on kdump. Devices are working in first kernel, and
>> after switching to second kernel and initializing IOMMU, many DMAR faults
>> occur and it causes problems like driver error or PCI SERR, at last
>> kdump fails. This patch fixes this problem.
>>
>> Signed-off-by: Takao Indoh 
>>
>>
>> NOTE:
>> To reset devices this patch uses bus reset interface introduced by
>> following commits in PCI "next" branch.
>>
>> 64e8674fbe6bc848333a9b7e19f8cc019dde9eab
>> 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
>> 2e35afaefe64946caaecfacaf7fb568e46185e88
>> 608c388122c72e1bf11ba8113434eb3d0c40c32d
>> 77cb985ad4acbe66a92ead1bb826deffa47dd33f
>> 090a3c5322e900f468b3205b76d0837003ad57b2
>> a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
>> de0c548c33429cc78fd47a3c190c6d00b0e4e441
>> 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
>> ---
>>   drivers/iommu/intel-iommu.c |   55 
>> ++-
>>   1 files changed, 54 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index eec0d3e..efb98eb 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
>>  .notifier_call = device_notifier,
>>   };
>>   
>> +/* Reset PCI devices if its entry exists in DMAR table */
>> +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 
>> segment)
>> +{
>> +u64 addr;
>> +struct root_entry *root;
>> +struct context_entry *context;
>> +int bus, devfn;
>> +struct pci_dev *dev;
>> +
>> +addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
>> +if (!addr)
>> +return;
>> +
>> +/*
>> + *  In the case of kdump, ioremap is needed because root-entry table
>> + *  exists in first kernel's memory area which is not mapped in second
>> + *  kernel
>> + */
>> +root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
>> +if (!root)
>> +return;
>> +
>> +for (bus = 0; bus < ROOT_ENTRY_NR; bus++) {
>> +if (!root_present([bus]))
>> +continue;
>> +
>> +context = (struct context_entry *)ioremap(
>> +root[bus].val & VTD_PAGE_MASK, PAGE_SIZE);
>> +if (!context)
>> +continue;
>> +
>> +for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) {
> For context_entry table, the context_entry has the same size as
> root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for
> future extention and removing confusion, can we define a new MACRO on
> calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR?

That makes sense, will do in next version.


> 
>> +if (!context_present([devfn]))
>> +continue;
>> +
>> +dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
>> +if (!dev)
>> +continue;
>> +
>> +if (!pci_reset_bus(dev->bus)) /* go to next bus */
> 
> Here, we have got the specific dev, why don't we just call
> pci_reset_function? If call pci_reset_bus here, will it repeat resetting
> devices on the same bus many times?

pci_reset_bus() can reset all devices on the same bus at the same time.
I think it is better than calling pci_reset_function() many times for
each device/function.

When pci_reset_bus() returns 0 (reset succeeded), we can immediately go
out of devfn loop by "break" and go to next bus loop.

> 
>> +break;
>> +else /* Try per-function reset */
>> +pci_reset_function(dev);
>> +
>> +}
>> +iounmap(context);
>> +}
>> +iounmap(root);
>> +}
>> +
>>   int __init intel_iommu_init(void)
>>   {
>>  int ret = 0;
>> @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
>>  continue;
>>   
>>  iommu = drhd->iommu;
>> -if (iommu->gcmd & DMA_GCMD_TE)
>> +if (iommu->gcmd & DMA_GCMD_TE) {
>> +if (reset_devices)
>> +iommu_reset_devices(iommu, drhd->segment);
> 
> I remember the double reset issue vivek concerned in the old patch. Here
> the iommu reset is done at the very beginning of pci_iommu_init, it's

Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Baoquan He
Hi,
Patch is great and works on my HP-z420.
There are several small concerns, please see inline comments.

On 08/21/13 at 04:15pm, Takao Indoh wrote:
> This patch quiesces devices before disabling IOMMU on boot to stop
> ongoing DMA. In intel_iommu_init(), check context entries and if there
> is entry whose present bit is set then reset corresponding device.
> 
> When IOMMU is already enabled on boot, it is disabled and new DMAR table
> is created and then re-enabled in intel_iommu_init(). This causes DMAR
> faults if there are in-flight DMAs.
> 
> This causes problem on kdump. Devices are working in first kernel, and
> after switching to second kernel and initializing IOMMU, many DMAR faults
> occur and it causes problems like driver error or PCI SERR, at last
> kdump fails. This patch fixes this problem.
> 
> Signed-off-by: Takao Indoh 
> 
> 
> NOTE:
> To reset devices this patch uses bus reset interface introduced by
> following commits in PCI "next" branch.
> 
> 64e8674fbe6bc848333a9b7e19f8cc019dde9eab
> 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
> 2e35afaefe64946caaecfacaf7fb568e46185e88
> 608c388122c72e1bf11ba8113434eb3d0c40c32d
> 77cb985ad4acbe66a92ead1bb826deffa47dd33f
> 090a3c5322e900f468b3205b76d0837003ad57b2
> a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
> de0c548c33429cc78fd47a3c190c6d00b0e4e441
> 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
> ---
>  drivers/iommu/intel-iommu.c |   55 
> ++-
>  1 files changed, 54 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index eec0d3e..efb98eb 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
>   .notifier_call = device_notifier,
>  };
>  
> +/* Reset PCI devices if its entry exists in DMAR table */
> +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 
> segment)
> +{
> + u64 addr;
> + struct root_entry *root;
> + struct context_entry *context;
> + int bus, devfn;
> + struct pci_dev *dev;
> +
> + addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
> + if (!addr)
> + return;
> +
> + /*
> +  *  In the case of kdump, ioremap is needed because root-entry table
> +  *  exists in first kernel's memory area which is not mapped in second
> +  *  kernel
> +  */
> + root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
> + if (!root)
> + return;
> +
> + for (bus = 0; bus < ROOT_ENTRY_NR; bus++) {
> + if (!root_present([bus]))
> + continue;
> +
> + context = (struct context_entry *)ioremap(
> + root[bus].val & VTD_PAGE_MASK, PAGE_SIZE);
> + if (!context)
> + continue;
> +
> + for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) {
For context_entry table, the context_entry has the same size as
root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for
future extention and removing confusion, can we define a new MACRO on
calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR?

> + if (!context_present([devfn]))
> + continue;
> +
> + dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
> + if (!dev)
> + continue;
> +
> + if (!pci_reset_bus(dev->bus)) /* go to next bus */

Here, we have got the specific dev, why don't we just call
pci_reset_function? If call pci_reset_bus here, will it repeat resetting
devices on the same bus many times?

> + break;
> + else /* Try per-function reset */
> + pci_reset_function(dev);
> +
> + }
> + iounmap(context);
> + }
> + iounmap(root);
> +}
> +
>  int __init intel_iommu_init(void)
>  {
>   int ret = 0;
> @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
>   continue;
>  
>   iommu = drhd->iommu;
> - if (iommu->gcmd & DMA_GCMD_TE)
> + if (iommu->gcmd & DMA_GCMD_TE) {
> + if (reset_devices)
> + iommu_reset_devices(iommu, drhd->segment);

I remember the double reset issue vivek concerned in the old patch. Here
the iommu reset is done at the very beginning of pci_iommu_init, it's
after the bus subsystem init, it means here only iommu reset, am I
right?

If yes, I think this patch is clear and logic is simple.

BTW, what's the status of Alex's PCI patchset which this patch depends
on?

Baoquan
Thanks

>   iommu_disable_translation(iommu);
> + }
>   }
>  
>   if (dmar_dev_scope_init() < 0) {
> -- 
> 1.7.1
> 
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> 

Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Baoquan He
Hi,
Patch is great and works on my HP-z420.
There are several small concerns, please see inline comments.

On 08/21/13 at 04:15pm, Takao Indoh wrote:
 This patch quiesces devices before disabling IOMMU on boot to stop
 ongoing DMA. In intel_iommu_init(), check context entries and if there
 is entry whose present bit is set then reset corresponding device.
 
 When IOMMU is already enabled on boot, it is disabled and new DMAR table
 is created and then re-enabled in intel_iommu_init(). This causes DMAR
 faults if there are in-flight DMAs.
 
 This causes problem on kdump. Devices are working in first kernel, and
 after switching to second kernel and initializing IOMMU, many DMAR faults
 occur and it causes problems like driver error or PCI SERR, at last
 kdump fails. This patch fixes this problem.
 
 Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com
 
 
 NOTE:
 To reset devices this patch uses bus reset interface introduced by
 following commits in PCI next branch.
 
 64e8674fbe6bc848333a9b7e19f8cc019dde9eab
 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
 2e35afaefe64946caaecfacaf7fb568e46185e88
 608c388122c72e1bf11ba8113434eb3d0c40c32d
 77cb985ad4acbe66a92ead1bb826deffa47dd33f
 090a3c5322e900f468b3205b76d0837003ad57b2
 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
 de0c548c33429cc78fd47a3c190c6d00b0e4e441
 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
 ---
  drivers/iommu/intel-iommu.c |   55 
 ++-
  1 files changed, 54 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index eec0d3e..efb98eb 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
   .notifier_call = device_notifier,
  };
  
 +/* Reset PCI devices if its entry exists in DMAR table */
 +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 
 segment)
 +{
 + u64 addr;
 + struct root_entry *root;
 + struct context_entry *context;
 + int bus, devfn;
 + struct pci_dev *dev;
 +
 + addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG);
 + if (!addr)
 + return;
 +
 + /*
 +  *  In the case of kdump, ioremap is needed because root-entry table
 +  *  exists in first kernel's memory area which is not mapped in second
 +  *  kernel
 +  */
 + root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
 + if (!root)
 + return;
 +
 + for (bus = 0; bus  ROOT_ENTRY_NR; bus++) {
 + if (!root_present(root[bus]))
 + continue;
 +
 + context = (struct context_entry *)ioremap(
 + root[bus].val  VTD_PAGE_MASK, PAGE_SIZE);
 + if (!context)
 + continue;
 +
 + for (devfn = 0; devfn  ROOT_ENTRY_NR; devfn++) {
For context_entry table, the context_entry has the same size as
root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for
future extention and removing confusion, can we define a new MACRO on
calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR?

 + if (!context_present(context[devfn]))
 + continue;
 +
 + dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
 + if (!dev)
 + continue;
 +
 + if (!pci_reset_bus(dev-bus)) /* go to next bus */

Here, we have got the specific dev, why don't we just call
pci_reset_function? If call pci_reset_bus here, will it repeat resetting
devices on the same bus many times?

 + break;
 + else /* Try per-function reset */
 + pci_reset_function(dev);
 +
 + }
 + iounmap(context);
 + }
 + iounmap(root);
 +}
 +
  int __init intel_iommu_init(void)
  {
   int ret = 0;
 @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
   continue;
  
   iommu = drhd-iommu;
 - if (iommu-gcmd  DMA_GCMD_TE)
 + if (iommu-gcmd  DMA_GCMD_TE) {
 + if (reset_devices)
 + iommu_reset_devices(iommu, drhd-segment);

I remember the double reset issue vivek concerned in the old patch. Here
the iommu reset is done at the very beginning of pci_iommu_init, it's
after the bus subsystem init, it means here only iommu reset, am I
right?

If yes, I think this patch is clear and logic is simple.

BTW, what's the status of Alex's PCI patchset which this patch depends
on?

Baoquan
Thanks

   iommu_disable_translation(iommu);
 + }
   }
  
   if (dmar_dev_scope_init()  0) {
 -- 
 1.7.1
 
 
 
 ___
 kexec mailing list
 ke...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Takao Indoh
(2013/09/08 20:47), Baoquan He wrote:
 Hi,
 Patch is great and works on my HP-z420.

Thank you for your test!

 There are several small concerns, please see inline comments.
 
 On 08/21/13 at 04:15pm, Takao Indoh wrote:
 This patch quiesces devices before disabling IOMMU on boot to stop
 ongoing DMA. In intel_iommu_init(), check context entries and if there
 is entry whose present bit is set then reset corresponding device.

 When IOMMU is already enabled on boot, it is disabled and new DMAR table
 is created and then re-enabled in intel_iommu_init(). This causes DMAR
 faults if there are in-flight DMAs.

 This causes problem on kdump. Devices are working in first kernel, and
 after switching to second kernel and initializing IOMMU, many DMAR faults
 occur and it causes problems like driver error or PCI SERR, at last
 kdump fails. This patch fixes this problem.

 Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com


 NOTE:
 To reset devices this patch uses bus reset interface introduced by
 following commits in PCI next branch.

 64e8674fbe6bc848333a9b7e19f8cc019dde9eab
 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
 2e35afaefe64946caaecfacaf7fb568e46185e88
 608c388122c72e1bf11ba8113434eb3d0c40c32d
 77cb985ad4acbe66a92ead1bb826deffa47dd33f
 090a3c5322e900f468b3205b76d0837003ad57b2
 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
 de0c548c33429cc78fd47a3c190c6d00b0e4e441
 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
 ---
   drivers/iommu/intel-iommu.c |   55 
 ++-
   1 files changed, 54 insertions(+), 1 deletions(-)

 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index eec0d3e..efb98eb 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
  .notifier_call = device_notifier,
   };
   
 +/* Reset PCI devices if its entry exists in DMAR table */
 +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 
 segment)
 +{
 +u64 addr;
 +struct root_entry *root;
 +struct context_entry *context;
 +int bus, devfn;
 +struct pci_dev *dev;
 +
 +addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG);
 +if (!addr)
 +return;
 +
 +/*
 + *  In the case of kdump, ioremap is needed because root-entry table
 + *  exists in first kernel's memory area which is not mapped in second
 + *  kernel
 + */
 +root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
 +if (!root)
 +return;
 +
 +for (bus = 0; bus  ROOT_ENTRY_NR; bus++) {
 +if (!root_present(root[bus]))
 +continue;
 +
 +context = (struct context_entry *)ioremap(
 +root[bus].val  VTD_PAGE_MASK, PAGE_SIZE);
 +if (!context)
 +continue;
 +
 +for (devfn = 0; devfn  ROOT_ENTRY_NR; devfn++) {
 For context_entry table, the context_entry has the same size as
 root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for
 future extention and removing confusion, can we define a new MACRO on
 calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR?

That makes sense, will do in next version.


 
 +if (!context_present(context[devfn]))
 +continue;
 +
 +dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
 +if (!dev)
 +continue;
 +
 +if (!pci_reset_bus(dev-bus)) /* go to next bus */
 
 Here, we have got the specific dev, why don't we just call
 pci_reset_function? If call pci_reset_bus here, will it repeat resetting
 devices on the same bus many times?

pci_reset_bus() can reset all devices on the same bus at the same time.
I think it is better than calling pci_reset_function() many times for
each device/function.

When pci_reset_bus() returns 0 (reset succeeded), we can immediately go
out of devfn loop by break and go to next bus loop.

 
 +break;
 +else /* Try per-function reset */
 +pci_reset_function(dev);
 +
 +}
 +iounmap(context);
 +}
 +iounmap(root);
 +}
 +
   int __init intel_iommu_init(void)
   {
  int ret = 0;
 @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
  continue;
   
  iommu = drhd-iommu;
 -if (iommu-gcmd  DMA_GCMD_TE)
 +if (iommu-gcmd  DMA_GCMD_TE) {
 +if (reset_devices)
 +iommu_reset_devices(iommu, drhd-segment);
 
 I remember the double reset issue vivek concerned in the old patch. Here
 the iommu reset is done at the very beginning of pci_iommu_init, it's
 after the bus subsystem init, it means here only iommu reset, am I
 right?

Yes, exactly.

 
 If yes, I think this patch is clear and logic is simple.
 
 BTW, what's the status of Alex's PCI patchset 

Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-09-08 Thread Takao Indoh
(2013/09/09 13:28), Takao Indoh wrote:
 BTW, what's the status of Alex's PCI patchset which this patch depends
 on?
 
 It is merged to Bjorn's tree, will be in v3.12.

This was wrong. Alex's patchset is already in Linus tree, v3.11.

Thanks,
Takao Indoh


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-08-21 Thread Takao Indoh
This patch quiesces devices before disabling IOMMU on boot to stop
ongoing DMA. In intel_iommu_init(), check context entries and if there
is entry whose present bit is set then reset corresponding device.

When IOMMU is already enabled on boot, it is disabled and new DMAR table
is created and then re-enabled in intel_iommu_init(). This causes DMAR
faults if there are in-flight DMAs.

This causes problem on kdump. Devices are working in first kernel, and
after switching to second kernel and initializing IOMMU, many DMAR faults
occur and it causes problems like driver error or PCI SERR, at last
kdump fails. This patch fixes this problem.

Signed-off-by: Takao Indoh 


NOTE:
To reset devices this patch uses bus reset interface introduced by
following commits in PCI "next" branch.

64e8674fbe6bc848333a9b7e19f8cc019dde9eab
5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
2e35afaefe64946caaecfacaf7fb568e46185e88
608c388122c72e1bf11ba8113434eb3d0c40c32d
77cb985ad4acbe66a92ead1bb826deffa47dd33f
090a3c5322e900f468b3205b76d0837003ad57b2
a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
de0c548c33429cc78fd47a3c190c6d00b0e4e441
1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
---
 drivers/iommu/intel-iommu.c |   55 ++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index eec0d3e..efb98eb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
.notifier_call = device_notifier,
 };
 
+/* Reset PCI devices if its entry exists in DMAR table */
+static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment)
+{
+   u64 addr;
+   struct root_entry *root;
+   struct context_entry *context;
+   int bus, devfn;
+   struct pci_dev *dev;
+
+   addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
+   if (!addr)
+   return;
+
+   /*
+*  In the case of kdump, ioremap is needed because root-entry table
+*  exists in first kernel's memory area which is not mapped in second
+*  kernel
+*/
+   root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
+   if (!root)
+   return;
+
+   for (bus = 0; bus < ROOT_ENTRY_NR; bus++) {
+   if (!root_present([bus]))
+   continue;
+
+   context = (struct context_entry *)ioremap(
+   root[bus].val & VTD_PAGE_MASK, PAGE_SIZE);
+   if (!context)
+   continue;
+
+   for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) {
+   if (!context_present([devfn]))
+   continue;
+
+   dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
+   if (!dev)
+   continue;
+
+   if (!pci_reset_bus(dev->bus)) /* go to next bus */
+   break;
+   else /* Try per-function reset */
+   pci_reset_function(dev);
+
+   }
+   iounmap(context);
+   }
+   iounmap(root);
+}
+
 int __init intel_iommu_init(void)
 {
int ret = 0;
@@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
continue;
 
iommu = drhd->iommu;
-   if (iommu->gcmd & DMA_GCMD_TE)
+   if (iommu->gcmd & DMA_GCMD_TE) {
+   if (reset_devices)
+   iommu_reset_devices(iommu, drhd->segment);
iommu_disable_translation(iommu);
+   }
}
 
if (dmar_dev_scope_init() < 0) {
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] intel-iommu: Quiesce devices before disabling IOMMU

2013-08-21 Thread Takao Indoh
This patch quiesces devices before disabling IOMMU on boot to stop
ongoing DMA. In intel_iommu_init(), check context entries and if there
is entry whose present bit is set then reset corresponding device.

When IOMMU is already enabled on boot, it is disabled and new DMAR table
is created and then re-enabled in intel_iommu_init(). This causes DMAR
faults if there are in-flight DMAs.

This causes problem on kdump. Devices are working in first kernel, and
after switching to second kernel and initializing IOMMU, many DMAR faults
occur and it causes problems like driver error or PCI SERR, at last
kdump fails. This patch fixes this problem.

Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com


NOTE:
To reset devices this patch uses bus reset interface introduced by
following commits in PCI next branch.

64e8674fbe6bc848333a9b7e19f8cc019dde9eab
5c32b35b004f5ef70dcf62bbc42b8bed1e50b471
2e35afaefe64946caaecfacaf7fb568e46185e88
608c388122c72e1bf11ba8113434eb3d0c40c32d
77cb985ad4acbe66a92ead1bb826deffa47dd33f
090a3c5322e900f468b3205b76d0837003ad57b2
a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8
de0c548c33429cc78fd47a3c190c6d00b0e4e441
1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d
---
 drivers/iommu/intel-iommu.c |   55 ++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index eec0d3e..efb98eb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = {
.notifier_call = device_notifier,
 };
 
+/* Reset PCI devices if its entry exists in DMAR table */
+static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment)
+{
+   u64 addr;
+   struct root_entry *root;
+   struct context_entry *context;
+   int bus, devfn;
+   struct pci_dev *dev;
+
+   addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG);
+   if (!addr)
+   return;
+
+   /*
+*  In the case of kdump, ioremap is needed because root-entry table
+*  exists in first kernel's memory area which is not mapped in second
+*  kernel
+*/
+   root = (struct root_entry *)ioremap(addr, PAGE_SIZE);
+   if (!root)
+   return;
+
+   for (bus = 0; bus  ROOT_ENTRY_NR; bus++) {
+   if (!root_present(root[bus]))
+   continue;
+
+   context = (struct context_entry *)ioremap(
+   root[bus].val  VTD_PAGE_MASK, PAGE_SIZE);
+   if (!context)
+   continue;
+
+   for (devfn = 0; devfn  ROOT_ENTRY_NR; devfn++) {
+   if (!context_present(context[devfn]))
+   continue;
+
+   dev = pci_get_domain_bus_and_slot(segment, bus, devfn);
+   if (!dev)
+   continue;
+
+   if (!pci_reset_bus(dev-bus)) /* go to next bus */
+   break;
+   else /* Try per-function reset */
+   pci_reset_function(dev);
+
+   }
+   iounmap(context);
+   }
+   iounmap(root);
+}
+
 int __init intel_iommu_init(void)
 {
int ret = 0;
@@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void)
continue;
 
iommu = drhd-iommu;
-   if (iommu-gcmd  DMA_GCMD_TE)
+   if (iommu-gcmd  DMA_GCMD_TE) {
+   if (reset_devices)
+   iommu_reset_devices(iommu, drhd-segment);
iommu_disable_translation(iommu);
+   }
}
 
if (dmar_dev_scope_init()  0) {
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/