Re: [PATCH] iommu/vt-d: Ratelimit fault handler
On Tue, 15 Mar 2016 19:47:56 + David Woodhousewrote: > On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote: > > Fault rates can easily overwhelm the console and make the system > > unresponsive. Ratelimit to allow an opportunity for maintenance. > > > > Signed-off-by: Alex Williamson > > Rather than just rate-limiting the printk, I'd prefer to handle this > explicitly. There's a bit in the context-entry which can tell the IOMMU > not to bother raising an interrupt at all. And then we can re-enable it > if/when the driver recovers the device. (Or perhaps just when it next > does a mapping). Seems like we need to keep statistics per context entry for that, are you prepared for that sort of overhead? IME, a device that's spewing faults at this rate is broken to the point where it needs to be removed from the system or is actively being tested and debugged for driver or assignment work. In those case, I think we want to keep reminding the user that something is very wrong and it probably explains why the device isn't working properly. If the device is using the DMA API, maybe clearing FPD on each mapping event is a way to do that, but an IOMMU API managed device might have very long lived mapping entries. It seems impractical to setup statistics per context entry and timers to check back on them for things that really ought to be rare events. My goal was only to reduce the overall impact on the system so that it's usable when this occurs. > We really ought to be reporting faults to drivers too, FWIW. I keep > meaning to take a look at that. Yes, that path has been absent for far too long. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Ratelimit fault handler
On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote: > Fault rates can easily overwhelm the console and make the system > unresponsive. Ratelimit to allow an opportunity for maintenance. A few suggestions: o Use a single ratelimit state. o The multiple lines output are unnecessary and hard to grep in the dmesg output because of inconsistent prefixing as second and subsequent output lines are not prefixed by pr_fmt. o The DMAR prefix on the second block is also unnecessary as it's already prefixed by pr_fmt o Coalesce the formats for easier grep. so maybe: --- drivers/iommu/dmar.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8ffd756..59dcaaa 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1575,23 +1575,27 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, { const char *reason; int fault_type; + static DEFINE_RATELIMIT_STATE(rs, + DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + + if (__ratelimit()) + return 0; reason = dmar_get_fault_reason(fault_reason, _type); if (fault_type == INTR_REMAP) - pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] " - "fault index %llx\n" - "INTR-REMAP:[fault reason %02d] %s\n", - (source_id >> 8), PCI_SLOT(source_id & 0xFF), - PCI_FUNC(source_id & 0xFF), addr >> 48, - fault_reason, reason); + pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index %llx [fault reason %02d] %s\n", + source_id >> 8, PCI_SLOT(source_id & 0xFF), + PCI_FUNC(source_id & 0xFF), addr >> 48, + fault_reason, reason); else - pr_err("DMAR:[%s] Request device [%02x:%02x.%d] " - "fault addr %llx \n" - "DMAR:[fault reason %02d] %s\n", - (type ? "DMA Read" : "DMA Write"), - (source_id >> 8), PCI_SLOT(source_id & 0xFF), - PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); + pr_err("[%s] Request device [%02x:%02x.%d] fault addr %llx [fault reason %02d] %s\n", + type ? "DMA Read" : "DMA Write", + source_id >> 8, PCI_SLOT(source_id & 0xFF), + PCI_FUNC(source_id & 0xFF), addr, + fault_reason, reason); + return 0; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Ratelimit fault handler
On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote: > Fault rates can easily overwhelm the console and make the system > unresponsive. Ratelimit to allow an opportunity for maintenance. > > Signed-off-by: Alex WilliamsonRather than just rate-limiting the printk, I'd prefer to handle this explicitly. There's a bit in the context-entry which can tell the IOMMU not to bother raising an interrupt at all. And then we can re-enable it if/when the driver recovers the device. (Or perhaps just when it next does a mapping). We really ought to be reporting faults to drivers too, FWIW. I keep meaning to take a look at that. -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Ratelimit fault handler
On Tue, 2016-03-15 at 10:10 -0700, Joe Perches wrote: > o Use a single ratelimit state. [] > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c [] > + if (__ratelimit()) > + return 0; That of course should be: if (!__ratelimit()) return 0; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/vt-d: Ratelimit fault handler
Fault rates can easily overwhelm the console and make the system unresponsive. Ratelimit to allow an opportunity for maintenance. Signed-off-by: Alex Williamson--- drivers/iommu/dmar.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8ffd756..628987e 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1579,19 +1579,20 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type, reason = dmar_get_fault_reason(fault_reason, _type); if (fault_type == INTR_REMAP) - pr_err("INTR-REMAP: Request device [[%02x:%02x.%d] " - "fault index %llx\n" - "INTR-REMAP:[fault reason %02d] %s\n", - (source_id >> 8), PCI_SLOT(source_id & 0xFF), - PCI_FUNC(source_id & 0xFF), addr >> 48, - fault_reason, reason); + pr_err_ratelimited("INTR-REMAP: Request device [[%02x:%02x.%d] " + "fault index %llx\n" + "INTR-REMAP:[fault reason %02d] %s\n", + (source_id >> 8), PCI_SLOT(source_id & 0xFF), + PCI_FUNC(source_id & 0xFF), addr >> 48, + fault_reason, reason); else - pr_err("DMAR:[%s] Request device [%02x:%02x.%d] " - "fault addr %llx \n" - "DMAR:[fault reason %02d] %s\n", - (type ? "DMA Read" : "DMA Write"), - (source_id >> 8), PCI_SLOT(source_id & 0xFF), - PCI_FUNC(source_id & 0xFF), addr, fault_reason, reason); + pr_err_ratelimited("DMAR:[%s] Request device [%02x:%02x.%d] " + "fault addr %llx \n" + "DMAR:[fault reason %02d] %s\n", + (type ? "DMA Read" : "DMA Write"), + (source_id >> 8), PCI_SLOT(source_id & 0xFF), + PCI_FUNC(source_id & 0xFF), addr, + fault_reason, reason); return 0; } @@ -1606,7 +1607,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id) raw_spin_lock_irqsave(>register_lock, flag); fault_status = readl(iommu->reg + DMAR_FSTS_REG); if (fault_status) - pr_err("DRHD: handling fault status reg %x\n", fault_status); + pr_err_ratelimited("DRHD: handling fault status reg %x\n", + fault_status); /* TBD: ignore advanced fault log currently */ if (!(fault_status & DMA_FSTS_PPF)) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu