Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote: > (2013/09/09 18:07), David Woodhouse wrote: > > If the driver is so broken that it cannot get the device working again > > after a fault, surely the driver needs to be fixed? > > Yes,this problem may be solved by fixing driver. Actually megaraid sas > driver is recently fixed for this problem. (See commit 6431f5d7) > > But I think root cause of this problem is initializing IOMMU while DMA > is still working, and I want to solve the root cause rather than > handling it in each driver, otherwise we have to fix driver each time we > find this kind of problem. But if the driver is broken and cannot actually recover from hardware issues, the driver needs to be fixed *anyway*. We shouldn't be papering over the problem. > > For the IOMMU code to reset individual devices, just because they still > > have an active DMA mapping even if they're not *doing* DMA, seems wrong. > > Right, current code is resetting devices which *may* be doing DMA. The > ideal way is finding devices which are actually doing DMA and reset only > them but I don't know how we can do this, though I think current code > is sufficient. No, that's not the ideal way either. Their DMA will be blocked, and they'll stop (or at least we'll stop getting an interrupt and reporting their DMA faults, if the hardware *is* so broken that it keeps trying over and over again). The new driver will come up and reset the device, and all will be well. Do not paper over driver bugs. You are just *encouraging* brokenness. We need to fix the 'fault storm' issue, by setting the FPD bit in the context-entry for offending devices when appropriate, and then clearing it again when appropriate too. But for the IOMMU code to go out and trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG. Do Not Do This. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
On Tue, 2013-09-10 at 14:43 +0900, Takao Indoh wrote: (2013/09/09 18:07), David Woodhouse wrote: If the driver is so broken that it cannot get the device working again after a fault, surely the driver needs to be fixed? Yes,this problem may be solved by fixing driver. Actually megaraid sas driver is recently fixed for this problem. (See commit 6431f5d7) But I think root cause of this problem is initializing IOMMU while DMA is still working, and I want to solve the root cause rather than handling it in each driver, otherwise we have to fix driver each time we find this kind of problem. But if the driver is broken and cannot actually recover from hardware issues, the driver needs to be fixed *anyway*. We shouldn't be papering over the problem. For the IOMMU code to reset individual devices, just because they still have an active DMA mapping even if they're not *doing* DMA, seems wrong. Right, current code is resetting devices which *may* be doing DMA. The ideal way is finding devices which are actually doing DMA and reset only them but I don't know how we can do this, though I think current code is sufficient. No, that's not the ideal way either. Their DMA will be blocked, and they'll stop (or at least we'll stop getting an interrupt and reporting their DMA faults, if the hardware *is* so broken that it keeps trying over and over again). The new driver will come up and reset the device, and all will be well. Do not paper over driver bugs. You are just *encouraging* brokenness. We need to fix the 'fault storm' issue, by setting the FPD bit in the context-entry for offending devices when appropriate, and then clearing it again when appropriate too. But for the IOMMU code to go out and trigger a PCI reset of random devices and buses is ABSOLUTELY WRONG. Do Not Do This. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/09 18:07), David Woodhouse wrote: > On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote: >> >> This causes problem on kdump. Devices are working in first kernel, and >> after switching to second kernel and initializing IOMMU, many DMAR faults >> occur and it causes problems like driver error or PCI SERR, at last >> kdump fails. This patch fixes this problem. > > I'm not sure I'd call this a fix. > > If the driver is so broken that it cannot get the device working again > after a fault, surely the driver needs to be fixed? Yes,this problem may be solved by fixing driver. Actually megaraid sas driver is recently fixed for this problem. (See commit 6431f5d7) But I think root cause of this problem is initializing IOMMU while DMA is still working, and I want to solve the root cause rather than handling it in each driver, otherwise we have to fix driver each time we find this kind of problem. > > If the system is suffering an IRQ storm because device doesn't give up > after the first few faults, then we should switch off the fault > *reporting* for that device so that its faults get ignored (until it > next actually sets up a DMA mapping, or something). In such a case, yeah limiting messages is enough. > > For the IOMMU code to reset individual devices, just because they still > have an active DMA mapping even if they're not *doing* DMA, seems wrong. > You'll even end up resetting devices just because they have an RMRR, > won't you? (Although I wouldn't lose any sleep over that, I suppose. In > fact it might be a *feature*... :) Right, current code is resetting devices which *may* be doing DMA. The ideal way is finding devices which are actually doing DMA and reset only them but I don't know how we can do this, though I think current code is sufficient. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote: > > This causes problem on kdump. Devices are working in first kernel, and > after switching to second kernel and initializing IOMMU, many DMAR faults > occur and it causes problems like driver error or PCI SERR, at last > kdump fails. This patch fixes this problem. I'm not sure I'd call this a fix. If the driver is so broken that it cannot get the device working again after a fault, surely the driver needs to be fixed? If the system is suffering an IRQ storm because device doesn't give up after the first few faults, then we should switch off the fault *reporting* for that device so that its faults get ignored (until it next actually sets up a DMA mapping, or something). For the IOMMU code to reset individual devices, just because they still have an active DMA mapping even if they're not *doing* DMA, seems wrong. You'll even end up resetting devices just because they have an RMRR, won't you? (Although I wouldn't lose any sleep over that, I suppose. In fact it might be a *feature*... :) -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote: This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. I'm not sure I'd call this a fix. If the driver is so broken that it cannot get the device working again after a fault, surely the driver needs to be fixed? If the system is suffering an IRQ storm because device doesn't give up after the first few faults, then we should switch off the fault *reporting* for that device so that its faults get ignored (until it next actually sets up a DMA mapping, or something). For the IOMMU code to reset individual devices, just because they still have an active DMA mapping even if they're not *doing* DMA, seems wrong. You'll even end up resetting devices just because they have an RMRR, won't you? (Although I wouldn't lose any sleep over that, I suppose. In fact it might be a *feature*... :) -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/09 18:07), David Woodhouse wrote: On Wed, 2013-08-21 at 16:15 +0900, Takao Indoh wrote: This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. I'm not sure I'd call this a fix. If the driver is so broken that it cannot get the device working again after a fault, surely the driver needs to be fixed? Yes,this problem may be solved by fixing driver. Actually megaraid sas driver is recently fixed for this problem. (See commit 6431f5d7) But I think root cause of this problem is initializing IOMMU while DMA is still working, and I want to solve the root cause rather than handling it in each driver, otherwise we have to fix driver each time we find this kind of problem. If the system is suffering an IRQ storm because device doesn't give up after the first few faults, then we should switch off the fault *reporting* for that device so that its faults get ignored (until it next actually sets up a DMA mapping, or something). In such a case, yeah limiting messages is enough. For the IOMMU code to reset individual devices, just because they still have an active DMA mapping even if they're not *doing* DMA, seems wrong. You'll even end up resetting devices just because they have an RMRR, won't you? (Although I wouldn't lose any sleep over that, I suppose. In fact it might be a *feature*... :) Right, current code is resetting devices which *may* be doing DMA. The ideal way is finding devices which are actually doing DMA and reset only them but I don't know how we can do this, though I think current code is sufficient. Thanks, Takao Indoh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/09 13:28), Takao Indoh wrote: >> BTW, what's the status of Alex's PCI patchset which this patch depends >> on? > > It is merged to Bjorn's tree, will be in v3.12. This was wrong. Alex's patchset is already in Linus tree, v3.11. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/08 20:47), Baoquan He wrote: > Hi, > Patch is great and works on my HP-z420. Thank you for your test! > There are several small concerns, please see inline comments. > > On 08/21/13 at 04:15pm, Takao Indoh wrote: >> This patch quiesces devices before disabling IOMMU on boot to stop >> ongoing DMA. In intel_iommu_init(), check context entries and if there >> is entry whose present bit is set then reset corresponding device. >> >> When IOMMU is already enabled on boot, it is disabled and new DMAR table >> is created and then re-enabled in intel_iommu_init(). This causes DMAR >> faults if there are in-flight DMAs. >> >> This causes problem on kdump. Devices are working in first kernel, and >> after switching to second kernel and initializing IOMMU, many DMAR faults >> occur and it causes problems like driver error or PCI SERR, at last >> kdump fails. This patch fixes this problem. >> >> Signed-off-by: Takao Indoh >> >> >> NOTE: >> To reset devices this patch uses bus reset interface introduced by >> following commits in PCI "next" branch. >> >> 64e8674fbe6bc848333a9b7e19f8cc019dde9eab >> 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 >> 2e35afaefe64946caaecfacaf7fb568e46185e88 >> 608c388122c72e1bf11ba8113434eb3d0c40c32d >> 77cb985ad4acbe66a92ead1bb826deffa47dd33f >> 090a3c5322e900f468b3205b76d0837003ad57b2 >> a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 >> de0c548c33429cc78fd47a3c190c6d00b0e4e441 >> 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d >> --- >> drivers/iommu/intel-iommu.c | 55 >> ++- >> 1 files changed, 54 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >> index eec0d3e..efb98eb 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { >> .notifier_call = device_notifier, >> }; >> >> +/* Reset PCI devices if its entry exists in DMAR table */ >> +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 >> segment) >> +{ >> +u64 addr; >> +struct root_entry *root; >> +struct context_entry *context; >> +int bus, devfn; >> +struct pci_dev *dev; >> + >> +addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG); >> +if (!addr) >> +return; >> + >> +/* >> + * In the case of kdump, ioremap is needed because root-entry table >> + * exists in first kernel's memory area which is not mapped in second >> + * kernel >> + */ >> +root = (struct root_entry *)ioremap(addr, PAGE_SIZE); >> +if (!root) >> +return; >> + >> +for (bus = 0; bus < ROOT_ENTRY_NR; bus++) { >> +if (!root_present([bus])) >> +continue; >> + >> +context = (struct context_entry *)ioremap( >> +root[bus].val & VTD_PAGE_MASK, PAGE_SIZE); >> +if (!context) >> +continue; >> + >> +for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) { > For context_entry table, the context_entry has the same size as > root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for > future extention and removing confusion, can we define a new MACRO on > calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR? That makes sense, will do in next version. > >> +if (!context_present([devfn])) >> +continue; >> + >> +dev = pci_get_domain_bus_and_slot(segment, bus, devfn); >> +if (!dev) >> +continue; >> + >> +if (!pci_reset_bus(dev->bus)) /* go to next bus */ > > Here, we have got the specific dev, why don't we just call > pci_reset_function? If call pci_reset_bus here, will it repeat resetting > devices on the same bus many times? pci_reset_bus() can reset all devices on the same bus at the same time. I think it is better than calling pci_reset_function() many times for each device/function. When pci_reset_bus() returns 0 (reset succeeded), we can immediately go out of devfn loop by "break" and go to next bus loop. > >> +break; >> +else /* Try per-function reset */ >> +pci_reset_function(dev); >> + >> +} >> +iounmap(context); >> +} >> +iounmap(root); >> +} >> + >> int __init intel_iommu_init(void) >> { >> int ret = 0; >> @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) >> continue; >> >> iommu = drhd->iommu; >> -if (iommu->gcmd & DMA_GCMD_TE) >> +if (iommu->gcmd & DMA_GCMD_TE) { >> +if (reset_devices) >> +iommu_reset_devices(iommu, drhd->segment); > > I remember the double reset issue vivek concerned in the old patch. Here > the iommu reset is done at the very beginning of pci_iommu_init, it's
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
Hi, Patch is great and works on my HP-z420. There are several small concerns, please see inline comments. On 08/21/13 at 04:15pm, Takao Indoh wrote: > This patch quiesces devices before disabling IOMMU on boot to stop > ongoing DMA. In intel_iommu_init(), check context entries and if there > is entry whose present bit is set then reset corresponding device. > > When IOMMU is already enabled on boot, it is disabled and new DMAR table > is created and then re-enabled in intel_iommu_init(). This causes DMAR > faults if there are in-flight DMAs. > > This causes problem on kdump. Devices are working in first kernel, and > after switching to second kernel and initializing IOMMU, many DMAR faults > occur and it causes problems like driver error or PCI SERR, at last > kdump fails. This patch fixes this problem. > > Signed-off-by: Takao Indoh > > > NOTE: > To reset devices this patch uses bus reset interface introduced by > following commits in PCI "next" branch. > > 64e8674fbe6bc848333a9b7e19f8cc019dde9eab > 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 > 2e35afaefe64946caaecfacaf7fb568e46185e88 > 608c388122c72e1bf11ba8113434eb3d0c40c32d > 77cb985ad4acbe66a92ead1bb826deffa47dd33f > 090a3c5322e900f468b3205b76d0837003ad57b2 > a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 > de0c548c33429cc78fd47a3c190c6d00b0e4e441 > 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d > --- > drivers/iommu/intel-iommu.c | 55 > ++- > 1 files changed, 54 insertions(+), 1 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index eec0d3e..efb98eb 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { > .notifier_call = device_notifier, > }; > > +/* Reset PCI devices if its entry exists in DMAR table */ > +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 > segment) > +{ > + u64 addr; > + struct root_entry *root; > + struct context_entry *context; > + int bus, devfn; > + struct pci_dev *dev; > + > + addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG); > + if (!addr) > + return; > + > + /* > + * In the case of kdump, ioremap is needed because root-entry table > + * exists in first kernel's memory area which is not mapped in second > + * kernel > + */ > + root = (struct root_entry *)ioremap(addr, PAGE_SIZE); > + if (!root) > + return; > + > + for (bus = 0; bus < ROOT_ENTRY_NR; bus++) { > + if (!root_present([bus])) > + continue; > + > + context = (struct context_entry *)ioremap( > + root[bus].val & VTD_PAGE_MASK, PAGE_SIZE); > + if (!context) > + continue; > + > + for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) { For context_entry table, the context_entry has the same size as root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for future extention and removing confusion, can we define a new MACRO on calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR? > + if (!context_present([devfn])) > + continue; > + > + dev = pci_get_domain_bus_and_slot(segment, bus, devfn); > + if (!dev) > + continue; > + > + if (!pci_reset_bus(dev->bus)) /* go to next bus */ Here, we have got the specific dev, why don't we just call pci_reset_function? If call pci_reset_bus here, will it repeat resetting devices on the same bus many times? > + break; > + else /* Try per-function reset */ > + pci_reset_function(dev); > + > + } > + iounmap(context); > + } > + iounmap(root); > +} > + > int __init intel_iommu_init(void) > { > int ret = 0; > @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) > continue; > > iommu = drhd->iommu; > - if (iommu->gcmd & DMA_GCMD_TE) > + if (iommu->gcmd & DMA_GCMD_TE) { > + if (reset_devices) > + iommu_reset_devices(iommu, drhd->segment); I remember the double reset issue vivek concerned in the old patch. Here the iommu reset is done at the very beginning of pci_iommu_init, it's after the bus subsystem init, it means here only iommu reset, am I right? If yes, I think this patch is clear and logic is simple. BTW, what's the status of Alex's PCI patchset which this patch depends on? Baoquan Thanks > iommu_disable_translation(iommu); > + } > } > > if (dmar_dev_scope_init() < 0) { > -- > 1.7.1 > > > > ___ > kexec mailing list > ke...@lists.infradead.org >
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
Hi, Patch is great and works on my HP-z420. There are several small concerns, please see inline comments. On 08/21/13 at 04:15pm, Takao Indoh wrote: This patch quiesces devices before disabling IOMMU on boot to stop ongoing DMA. In intel_iommu_init(), check context entries and if there is entry whose present bit is set then reset corresponding device. When IOMMU is already enabled on boot, it is disabled and new DMAR table is created and then re-enabled in intel_iommu_init(). This causes DMAR faults if there are in-flight DMAs. This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com NOTE: To reset devices this patch uses bus reset interface introduced by following commits in PCI next branch. 64e8674fbe6bc848333a9b7e19f8cc019dde9eab 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 2e35afaefe64946caaecfacaf7fb568e46185e88 608c388122c72e1bf11ba8113434eb3d0c40c32d 77cb985ad4acbe66a92ead1bb826deffa47dd33f 090a3c5322e900f468b3205b76d0837003ad57b2 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 de0c548c33429cc78fd47a3c190c6d00b0e4e441 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d --- drivers/iommu/intel-iommu.c | 55 ++- 1 files changed, 54 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index eec0d3e..efb98eb 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { .notifier_call = device_notifier, }; +/* Reset PCI devices if its entry exists in DMAR table */ +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment) +{ + u64 addr; + struct root_entry *root; + struct context_entry *context; + int bus, devfn; + struct pci_dev *dev; + + addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG); + if (!addr) + return; + + /* + * In the case of kdump, ioremap is needed because root-entry table + * exists in first kernel's memory area which is not mapped in second + * kernel + */ + root = (struct root_entry *)ioremap(addr, PAGE_SIZE); + if (!root) + return; + + for (bus = 0; bus ROOT_ENTRY_NR; bus++) { + if (!root_present(root[bus])) + continue; + + context = (struct context_entry *)ioremap( + root[bus].val VTD_PAGE_MASK, PAGE_SIZE); + if (!context) + continue; + + for (devfn = 0; devfn ROOT_ENTRY_NR; devfn++) { For context_entry table, the context_entry has the same size as root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for future extention and removing confusion, can we define a new MACRO on calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR? + if (!context_present(context[devfn])) + continue; + + dev = pci_get_domain_bus_and_slot(segment, bus, devfn); + if (!dev) + continue; + + if (!pci_reset_bus(dev-bus)) /* go to next bus */ Here, we have got the specific dev, why don't we just call pci_reset_function? If call pci_reset_bus here, will it repeat resetting devices on the same bus many times? + break; + else /* Try per-function reset */ + pci_reset_function(dev); + + } + iounmap(context); + } + iounmap(root); +} + int __init intel_iommu_init(void) { int ret = 0; @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) continue; iommu = drhd-iommu; - if (iommu-gcmd DMA_GCMD_TE) + if (iommu-gcmd DMA_GCMD_TE) { + if (reset_devices) + iommu_reset_devices(iommu, drhd-segment); I remember the double reset issue vivek concerned in the old patch. Here the iommu reset is done at the very beginning of pci_iommu_init, it's after the bus subsystem init, it means here only iommu reset, am I right? If yes, I think this patch is clear and logic is simple. BTW, what's the status of Alex's PCI patchset which this patch depends on? Baoquan Thanks iommu_disable_translation(iommu); + } } if (dmar_dev_scope_init() 0) { -- 1.7.1 ___ kexec mailing list ke...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec -- To unsubscribe from this list: send the line unsubscribe
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/08 20:47), Baoquan He wrote: Hi, Patch is great and works on my HP-z420. Thank you for your test! There are several small concerns, please see inline comments. On 08/21/13 at 04:15pm, Takao Indoh wrote: This patch quiesces devices before disabling IOMMU on boot to stop ongoing DMA. In intel_iommu_init(), check context entries and if there is entry whose present bit is set then reset corresponding device. When IOMMU is already enabled on boot, it is disabled and new DMAR table is created and then re-enabled in intel_iommu_init(). This causes DMAR faults if there are in-flight DMAs. This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com NOTE: To reset devices this patch uses bus reset interface introduced by following commits in PCI next branch. 64e8674fbe6bc848333a9b7e19f8cc019dde9eab 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 2e35afaefe64946caaecfacaf7fb568e46185e88 608c388122c72e1bf11ba8113434eb3d0c40c32d 77cb985ad4acbe66a92ead1bb826deffa47dd33f 090a3c5322e900f468b3205b76d0837003ad57b2 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 de0c548c33429cc78fd47a3c190c6d00b0e4e441 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d --- drivers/iommu/intel-iommu.c | 55 ++- 1 files changed, 54 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index eec0d3e..efb98eb 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { .notifier_call = device_notifier, }; +/* Reset PCI devices if its entry exists in DMAR table */ +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment) +{ +u64 addr; +struct root_entry *root; +struct context_entry *context; +int bus, devfn; +struct pci_dev *dev; + +addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG); +if (!addr) +return; + +/* + * In the case of kdump, ioremap is needed because root-entry table + * exists in first kernel's memory area which is not mapped in second + * kernel + */ +root = (struct root_entry *)ioremap(addr, PAGE_SIZE); +if (!root) +return; + +for (bus = 0; bus ROOT_ENTRY_NR; bus++) { +if (!root_present(root[bus])) +continue; + +context = (struct context_entry *)ioremap( +root[bus].val VTD_PAGE_MASK, PAGE_SIZE); +if (!context) +continue; + +for (devfn = 0; devfn ROOT_ENTRY_NR; devfn++) { For context_entry table, the context_entry has the same size as root_entry currently, so it's also correct to use ROOT_ENTRY_NR. But for future extention and removing confusion, can we define a new MACRO on calculating the size of context_entry table, e.g CONTEXT_ENTRY_NR? That makes sense, will do in next version. +if (!context_present(context[devfn])) +continue; + +dev = pci_get_domain_bus_and_slot(segment, bus, devfn); +if (!dev) +continue; + +if (!pci_reset_bus(dev-bus)) /* go to next bus */ Here, we have got the specific dev, why don't we just call pci_reset_function? If call pci_reset_bus here, will it repeat resetting devices on the same bus many times? pci_reset_bus() can reset all devices on the same bus at the same time. I think it is better than calling pci_reset_function() many times for each device/function. When pci_reset_bus() returns 0 (reset succeeded), we can immediately go out of devfn loop by break and go to next bus loop. +break; +else /* Try per-function reset */ +pci_reset_function(dev); + +} +iounmap(context); +} +iounmap(root); +} + int __init intel_iommu_init(void) { int ret = 0; @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) continue; iommu = drhd-iommu; -if (iommu-gcmd DMA_GCMD_TE) +if (iommu-gcmd DMA_GCMD_TE) { +if (reset_devices) +iommu_reset_devices(iommu, drhd-segment); I remember the double reset issue vivek concerned in the old patch. Here the iommu reset is done at the very beginning of pci_iommu_init, it's after the bus subsystem init, it means here only iommu reset, am I right? Yes, exactly. If yes, I think this patch is clear and logic is simple. BTW, what's the status of Alex's PCI patchset
Re: [PATCH] intel-iommu: Quiesce devices before disabling IOMMU
(2013/09/09 13:28), Takao Indoh wrote: BTW, what's the status of Alex's PCI patchset which this patch depends on? It is merged to Bjorn's tree, will be in v3.12. This was wrong. Alex's patchset is already in Linus tree, v3.11. Thanks, Takao Indoh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] intel-iommu: Quiesce devices before disabling IOMMU
This patch quiesces devices before disabling IOMMU on boot to stop ongoing DMA. In intel_iommu_init(), check context entries and if there is entry whose present bit is set then reset corresponding device. When IOMMU is already enabled on boot, it is disabled and new DMAR table is created and then re-enabled in intel_iommu_init(). This causes DMAR faults if there are in-flight DMAs. This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. Signed-off-by: Takao Indoh NOTE: To reset devices this patch uses bus reset interface introduced by following commits in PCI "next" branch. 64e8674fbe6bc848333a9b7e19f8cc019dde9eab 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 2e35afaefe64946caaecfacaf7fb568e46185e88 608c388122c72e1bf11ba8113434eb3d0c40c32d 77cb985ad4acbe66a92ead1bb826deffa47dd33f 090a3c5322e900f468b3205b76d0837003ad57b2 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 de0c548c33429cc78fd47a3c190c6d00b0e4e441 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d --- drivers/iommu/intel-iommu.c | 55 ++- 1 files changed, 54 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index eec0d3e..efb98eb 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { .notifier_call = device_notifier, }; +/* Reset PCI devices if its entry exists in DMAR table */ +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment) +{ + u64 addr; + struct root_entry *root; + struct context_entry *context; + int bus, devfn; + struct pci_dev *dev; + + addr = dmar_readq(iommu->reg + DMAR_RTADDR_REG); + if (!addr) + return; + + /* +* In the case of kdump, ioremap is needed because root-entry table +* exists in first kernel's memory area which is not mapped in second +* kernel +*/ + root = (struct root_entry *)ioremap(addr, PAGE_SIZE); + if (!root) + return; + + for (bus = 0; bus < ROOT_ENTRY_NR; bus++) { + if (!root_present([bus])) + continue; + + context = (struct context_entry *)ioremap( + root[bus].val & VTD_PAGE_MASK, PAGE_SIZE); + if (!context) + continue; + + for (devfn = 0; devfn < ROOT_ENTRY_NR; devfn++) { + if (!context_present([devfn])) + continue; + + dev = pci_get_domain_bus_and_slot(segment, bus, devfn); + if (!dev) + continue; + + if (!pci_reset_bus(dev->bus)) /* go to next bus */ + break; + else /* Try per-function reset */ + pci_reset_function(dev); + + } + iounmap(context); + } + iounmap(root); +} + int __init intel_iommu_init(void) { int ret = 0; @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) continue; iommu = drhd->iommu; - if (iommu->gcmd & DMA_GCMD_TE) + if (iommu->gcmd & DMA_GCMD_TE) { + if (reset_devices) + iommu_reset_devices(iommu, drhd->segment); iommu_disable_translation(iommu); + } } if (dmar_dev_scope_init() < 0) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] intel-iommu: Quiesce devices before disabling IOMMU
This patch quiesces devices before disabling IOMMU on boot to stop ongoing DMA. In intel_iommu_init(), check context entries and if there is entry whose present bit is set then reset corresponding device. When IOMMU is already enabled on boot, it is disabled and new DMAR table is created and then re-enabled in intel_iommu_init(). This causes DMAR faults if there are in-flight DMAs. This causes problem on kdump. Devices are working in first kernel, and after switching to second kernel and initializing IOMMU, many DMAR faults occur and it causes problems like driver error or PCI SERR, at last kdump fails. This patch fixes this problem. Signed-off-by: Takao Indoh indou.ta...@jp.fujitsu.com NOTE: To reset devices this patch uses bus reset interface introduced by following commits in PCI next branch. 64e8674fbe6bc848333a9b7e19f8cc019dde9eab 5c32b35b004f5ef70dcf62bbc42b8bed1e50b471 2e35afaefe64946caaecfacaf7fb568e46185e88 608c388122c72e1bf11ba8113434eb3d0c40c32d 77cb985ad4acbe66a92ead1bb826deffa47dd33f 090a3c5322e900f468b3205b76d0837003ad57b2 a6cbaadea0af9b4aa6eee2882f2aa761ab91a4f8 de0c548c33429cc78fd47a3c190c6d00b0e4e441 1b95ce8fc9c12fdb60047f2f9950f29e76e7c66d --- drivers/iommu/intel-iommu.c | 55 ++- 1 files changed, 54 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index eec0d3e..efb98eb 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3663,6 +3663,56 @@ static struct notifier_block device_nb = { .notifier_call = device_notifier, }; +/* Reset PCI devices if its entry exists in DMAR table */ +static void __init iommu_reset_devices(struct intel_iommu *iommu, u16 segment) +{ + u64 addr; + struct root_entry *root; + struct context_entry *context; + int bus, devfn; + struct pci_dev *dev; + + addr = dmar_readq(iommu-reg + DMAR_RTADDR_REG); + if (!addr) + return; + + /* +* In the case of kdump, ioremap is needed because root-entry table +* exists in first kernel's memory area which is not mapped in second +* kernel +*/ + root = (struct root_entry *)ioremap(addr, PAGE_SIZE); + if (!root) + return; + + for (bus = 0; bus ROOT_ENTRY_NR; bus++) { + if (!root_present(root[bus])) + continue; + + context = (struct context_entry *)ioremap( + root[bus].val VTD_PAGE_MASK, PAGE_SIZE); + if (!context) + continue; + + for (devfn = 0; devfn ROOT_ENTRY_NR; devfn++) { + if (!context_present(context[devfn])) + continue; + + dev = pci_get_domain_bus_and_slot(segment, bus, devfn); + if (!dev) + continue; + + if (!pci_reset_bus(dev-bus)) /* go to next bus */ + break; + else /* Try per-function reset */ + pci_reset_function(dev); + + } + iounmap(context); + } + iounmap(root); +} + int __init intel_iommu_init(void) { int ret = 0; @@ -3687,8 +3737,11 @@ int __init intel_iommu_init(void) continue; iommu = drhd-iommu; - if (iommu-gcmd DMA_GCMD_TE) + if (iommu-gcmd DMA_GCMD_TE) { + if (reset_devices) + iommu_reset_devices(iommu, drhd-segment); iommu_disable_translation(iommu); + } } if (dmar_dev_scope_init() 0) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/