Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-08-21 Thread Lennert Buytenhek
On Thu, Aug 05, 2021 at 11:26:25AM -0500, Suthikulpanit, Suravee wrote:

> Lennert,

Hi Suravee,


> > - EVENT_FLAG_I unset, but the request was a translation request
> >   (EVENT_FLAG_TR set) or the target page was not present
> >   (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW
> >   bit will be invalid, so don't try to map it to a
> >   IOMMU_FAULT_{READ,WRITE} code
> 
> So, why do we need to call report_iommu_fault() for this case?
> My understanding is we only have IOMMU_FAULT_[READ|WRITE].
> So, if we can't identify whether the DMA is read / write,
> we should not need to call report_iommu_fauilt(), is it?

I don't think that we should just altogether avoid logging the subset
of page faults for which we can't determine the read/write direction
on AMD platforms.

E.g. "access to an unmapped address" (which will have PR=0, and thus we
won't know if it was a read or a write access) is just as much of a page
fault as "write to a read-only page" (which will have PR=1, and thus the
RW bit will be accurate) is, and for RAS purposes, both events are
equally interesting, and important to know about.

It's true that we currently don't have a way of signaling to
report_iommu_fault() (and by extension, to the io_page_fault
tracepoint) that we're not sure whether the offending access was a read
or a write, but I think we can just add a bit to include/linux/iommu.h
to indicate that, something along the lines of:

 /* iommu fault flags */
 #define IOMMU_FAULT_READ0x0
 #define IOMMU_FAULT_WRITE   0x1
+#define IOMMU_FAULT_RW_UNKNOWN  0x2

(Cc'ing Ohad Ben-Cohen, who originally added this API.)

I don't think that it would be a good idea to just not signal the page
faults for which we don't know the read/write direction.


Thanks,
Lennert



> > - EVENT_FLAG_I unset, the request is a transaction request (EVENT_FLAG_TR
> >unset) and the target page was present (EVENT_FLAG_PR set): call
> >report_iommu_fault(), and use the RW bit to set IOMMU_FAULT_{READ,WRITE}
> > 
> > So I don't think we can merge the test for EVENT_FLAG_I with the
> > test for EVENT_FLAG_TR/EVENT_FLAG_PR.
> 
> The only condition that we would report_iommu_fault is
> I=0, TR=0, PR=1, isn't it. So we should be able to just check if PR=1.
> 
> 
> > We could do something like this, if you'd prefer:
> > 
> > #define IS_IOMMU_MEM_TRANSACTION(flags) \
> > (((flags) & EVENT_FLAG_I) == 0)
> > 
> > #define IS_RW_FLAG_VALID(flags) \
> > (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR)
> > 
> > #define IS_WRITE_REQUEST(flags) \
> > (IS_RW_FLAG_VALID(flags) && (flags & EVENT_FLAG_RW))
> > 
> > And then do something like:
> > 
> > if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) {
> > if (!report_iommu_fault(_data->domain->domain, >dev,
> > address,
> > IS_WRITE_REQUEST(flags) ?
> > IOMMU_FAULT_WRITE : IOMMU_FAULT_READ))
> 
> Actually, IS_WRITE_REQUEST() == 0 could mean:
> - I=0, TR=0, PR=1 and RW=0: This is fine.
> - I=0, (TR=1 or PR=0), and we should not be calling report_iommu_fault() here
>   since we cannot specify READ/WRITE here.
> 
> Thanks,
> Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH,RFC] iommu/amd: Recover from event log overflow

2021-08-21 Thread Lennert Buytenhek
The AMD IOMMU logs I/O page faults and such to a ring buffer in
system memory, and this ring buffer can overflow.  The AMD IOMMU
spec has the following to say about the interrupt status bit that
signals this overflow condition:

EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
event log overflow has occurred. This bit is set when a new
event is to be written to the event log and there is no usable
entry in the event log, causing the new event information to
be discarded. An interrupt is generated when EventOverflow = 1b
and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
entries are written while this bit is set. Software Note: To
resume logging, clear EventOverflow (W1C), and write a 1 to
MMIO Offset 0018h[EventLogEn].

The AMD IOMMU driver doesn't currently implement this recovery
sequence, meaning that if a ring buffer overflow occurs, logging of
EVT/PPR/GA events will cease entirely.

This patch implements the spec-mandated reset sequence, with the minor
tweak that the hardware seems to want to have a 0 written to MMIO Offset
0018h[EventLogEn] first, before writing an 1 into this field, or the
IOMMU won't actually resume logging events.

Signed-off-by: Lennert Buytenhek 
---
N.B.: I have only tested this change against an older in-house custom
kernel version.  (Where it appears to survive stress testing.)  This
version of the patch was only compile-tested.

This patch makes iommu_feature_{disable,enable}() be called at non-init
time (via amd_iommu_restart_event_logging()), and those functions perform
unguarded read-modify-write sequences on the MMIO_CONTROL_OFFSET register,
and I haven't yet verified whether that is safe to do.

Reproducing this issue is fairly easy if you can easily cause I/O page
faults -- either stick an msleep() in the amd_iommu_int_thread() loop,
or configure a slow serial console (the latter being how this was
initially found).

 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/init.c| 10 ++
 drivers/iommu/amd/iommu.c   | 10 --
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 416815a525d6..bb95edf74415 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -14,6 +14,7 @@
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
 extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..34b90f8b2056 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -110,6 +110,7 @@
 #define PASID_MASK 0x
 
 /* MMIO status bits */
+#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK  (1 << 0)
 #define MMIO_STATUS_EVT_INT_MASK   (1 << 1)
 #define MMIO_STATUS_COM_WAIT_INT_MASK  (1 << 2)
 #define MMIO_STATUS_PPR_INT_MASK   (1 << 6)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..44d1536474ed 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -639,6 +639,16 @@ static int __init alloc_command_buffer(struct amd_iommu 
*iommu)
return iommu->cmd_buf ? 0 : -ENOMEM;
 }
 
+/*
+ * This function restarts event logging in case the IOMMU experienced
+ * an event log buffer overflow.
+ */
+void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
+{
+   iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+   iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
+}
+
 /*
  * This function resets the command buffer if the IOMMU stopped fetching
  * commands from it.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..b04f0cd5ffc2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -736,7 +736,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct 
amd_iommu *iommu) { }
 #endif /* !CONFIG_IRQ_REMAP */
 
 #define AMD_IOMMU_INT_MASK \
-   (MMIO_STATUS_EVT_INT_MASK | \
+   (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \
+MMIO_STATUS_EVT_INT_MASK | \
 MMIO_STATUS_PPR_INT_MASK | \
 MMIO_STATUS_GALOG_INT_MASK)
 
@@ -746,7 +747,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
while (status & AMD_IOMMU_INT_MASK) {
-   /* Enable EVT and PPR and GA interrupts again */
+   /* Enable interrupt sources again */
writel(AMD_IOMMU_INT_MASK,
iom

[PATCH v3] iommu/amd: Use report_iommu_fault()

2021-08-03 Thread Lennert Buytenhek
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
   I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
   when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

   irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
   irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
   irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
   irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
   irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
   irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.

Signed-off-by: Lennert Buytenhek 
---
Changes for v3:
- Test fault flags via macros.  (Suggested by Suravee Suthikulpanit.)

Changes for v2:
- Don't call report_iommu_fault() for IRQ remapping faults.
  (Suggested by Joerg Roedel.)

 drivers/iommu/amd/amd_iommu_types.h |  4 
 drivers/iommu/amd/iommu.c   | 29 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
 #define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
 
 /* feature control bits */
 #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a7d6d78147b7..00975b08bd3f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -473,6 +473,22 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
pci_dev_put(pdev);
 }
 
+/*
+ * AMD I/O Virtualization Technology (IOMMU) Specification, revision
+ * 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says that the RW
+ * ("read-write") bit is only valid if the I/O page fault was caused
+ * by a memory transaction request referencing a page that was marked
+ * present.
+ */
+#define IS_IOMMU_MEM_TRANSACTION(flags)\
+   (((flags) & EVENT_FLAG_I) == 0)
+
+#define IS_RW_FLAG_VALID(flags)\
+   (((flags) & (EVENT_FLAG_TR | EVENT_FLAG_PR)) == EVENT_FLAG_PR)
+
+#define IS_WRITE_REQUEST(flags)\
+   (IS_RW_FLAG_VALID(flags) && ((flags) & EVENT_FLAG_RW))
+
 static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
u64 address, int flags)
 {
@@ -484,6 +500,18 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
+   /*
+* If this is a DMA fault (for which the I(nterrupt) bit will
+* be unset), allow report_iommu_fault() to prevent logging it.
+*/
+   if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) {
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address,
+   IS_WRITE_REQUEST(flags) ?
+   IOMMU_FAULT_WRITE : IOMMU_FAULT_READ))
+ 

Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-08-03 Thread Lennert Buytenhek
On Fri, Jul 30, 2021 at 05:32:16AM +0300, Lennert Buytenhek wrote:

> > > This patch makes iommu/amd call report_iommu_fault() when an I/O page
> > > fault occurs, which has two effects:
> > > 
> > > 1) It allows device drivers to register a callback to be notified of
> > > I/O page faults, via the iommu_set_fault_handler() API.
> > > 
> > > 2) It triggers the io_page_fault tracepoint in report_iommu_fault()
> > > when an I/O page fault occurs.
> > > 
> > > I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
> > > functionality for handling platform errors, and being able to be notified
> > > of I/O page faults for initiating corrective action is very useful -- and
> > > receiving such events via event tracing is a lot nicer than having to
> > > scrape them from kmsg.
> > > 
> > > A number of other IOMMU drivers already use report_iommu_fault(), and
> > > I/O page faults on those IOMMUs therefore already seem to trigger this
> > > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.
> > > 
> > > I copied the logic from the other callers of report_iommu_fault(), where
> > > if that function returns zero, the driver will have handled the fault,
> > > in which case we avoid logging information about the fault to the printk
> > > buffer from the IOMMU driver.
> > > 
> > > With this patch I see io_page_fault event tracing entries as expected:
> > > 
> > > irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x
> > > irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x
> > > irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x
> > > irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x
> > > irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x
> > > irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: 
> > > IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x
> > > 
> > > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
> > > spec, but I haven't tested that bit of the code, as the page faults I
> > > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
> > > case EVENT_FLAG_RW doesn't make sense.
> > > 
> > > Signed-off-by: Lennert Buytenhek 
> > > ---
> > > Changes since v1 RFC:
> > > 
> > > - Don't call report_iommu_fault() for IRQ remapping faults.
> > >(Suggested by Joerg Roedel.)
> > > 
> > >   drivers/iommu/amd/amd_iommu_types.h |  4 
> > >   drivers/iommu/amd/iommu.c   | 29 +
> > >   2 files changed, 33 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/amd/amd_iommu_types.h 
> > > b/drivers/iommu/amd/amd_iommu_types.h
> > > index 94c1a7a9876d..2f2c6630c24c 100644
> > > --- a/drivers/iommu/amd/amd_iommu_types.h
> > > +++ b/drivers/iommu/amd/amd_iommu_types.h
> > > @@ -138,6 +138,10 @@
> > >   #define EVENT_DOMID_MASK_HI 0xf
> > >   #define EVENT_FLAGS_MASK0xfff
> > >   #define EVENT_FLAGS_SHIFT   0x10
> > > +#define EVENT_FLAG_TR0x100
> > > +#define EVENT_FLAG_RW0x020
> > > +#define EVENT_FLAG_PR0x010
> > > +#define EVENT_FLAG_I 0x008
> > >   /* feature control bits */
> > >   #define CONTROL_IOMMU_EN0x00ULL
> > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > > index a7d6d78147b7..d9fb2c22d44a 100644
> > > --- a/drivers/iommu/amd/iommu.c
> > > +++ b/drivers/iommu/amd/iommu.c
> > 
> > What if we introduce:
> > 
> > +/*
> > + * AMD I/O Virtualization Technology (IOMMU) Specification,
> > + * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
> > + * that the RW ("read-write") bit is only valid if the I/O
> > + * page fault was caused by a memory transaction request
> > + * referencing a page that was marked present.
> > + */
> > +#define IO_PAGE_FAULT_MEM_MASK \
> > +   (EVENT_FLAG_TR | EVENT_

Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-07-29 Thread Lennert Buytenhek
On Wed, Jul 28, 2021 at 04:51:27PM -0500, Suthikulpanit, Suravee wrote:

> Lennert,

Hi Suravee,


> > This patch makes iommu/amd call report_iommu_fault() when an I/O page
> > fault occurs, which has two effects:
> > 
> > 1) It allows device drivers to register a callback to be notified of
> > I/O page faults, via the iommu_set_fault_handler() API.
> > 
> > 2) It triggers the io_page_fault tracepoint in report_iommu_fault()
> > when an I/O page fault occurs.
> > 
> > I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
> > functionality for handling platform errors, and being able to be notified
> > of I/O page faults for initiating corrective action is very useful -- and
> > receiving such events via event tracing is a lot nicer than having to
> > scrape them from kmsg.
> > 
> > A number of other IOMMU drivers already use report_iommu_fault(), and
> > I/O page faults on those IOMMUs therefore already seem to trigger this
> > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.
> > 
> > I copied the logic from the other callers of report_iommu_fault(), where
> > if that function returns zero, the driver will have handled the fault,
> > in which case we avoid logging information about the fault to the printk
> > buffer from the IOMMU driver.
> > 
> > With this patch I see io_page_fault event tracing entries as expected:
> > 
> > irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x
> > 
> > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
> > spec, but I haven't tested that bit of the code, as the page faults I
> > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
> > case EVENT_FLAG_RW doesn't make sense.
> > 
> > Signed-off-by: Lennert Buytenhek 
> > ---
> > Changes since v1 RFC:
> > 
> > - Don't call report_iommu_fault() for IRQ remapping faults.
> >(Suggested by Joerg Roedel.)
> > 
> >   drivers/iommu/amd/amd_iommu_types.h |  4 
> >   drivers/iommu/amd/iommu.c   | 29 +
> >   2 files changed, 33 insertions(+)
> > 
> > diff --git a/drivers/iommu/amd/amd_iommu_types.h 
> > b/drivers/iommu/amd/amd_iommu_types.h
> > index 94c1a7a9876d..2f2c6630c24c 100644
> > --- a/drivers/iommu/amd/amd_iommu_types.h
> > +++ b/drivers/iommu/amd/amd_iommu_types.h
> > @@ -138,6 +138,10 @@
> >   #define EVENT_DOMID_MASK_HI   0xf
> >   #define EVENT_FLAGS_MASK  0xfff
> >   #define EVENT_FLAGS_SHIFT 0x10
> > +#define EVENT_FLAG_TR  0x100
> > +#define EVENT_FLAG_RW  0x020
> > +#define EVENT_FLAG_PR  0x010
> > +#define EVENT_FLAG_I   0x008
> >   /* feature control bits */
> >   #define CONTROL_IOMMU_EN0x00ULL
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index a7d6d78147b7..d9fb2c22d44a 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> 
> What if we introduce:
> 
> +/*
> + * AMD I/O Virtualization Technology (IOMMU) Specification,
> + * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
> + * that the RW ("read-write") bit is only valid if the I/O
> + * page fault was caused by a memory transaction request
> + * referencing a page that was marked present.
> + */
> +#define IO_PAGE_FAULT_MEM_MASK \
> +   (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)
> +#define IS_IOMMU_MEM_TRANSACTION(x)\
> +   ((x & IO_PAGE_FAULT_MEM_MASK) == EVENT_FLAG_PR)
> 
> Note that this should have already checked w/ EVENT_FLAG_I == 0.
> 
> 
> > @@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
> > domain_id,
> > if (pdev)
> > dev_data = dev_iommu

Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-09-25 Thread Lennert Buytenhek
On Sat, Aug 21, 2021 at 06:44:39PM +0300, Lennert Buytenhek wrote:

> > > - EVENT_FLAG_I unset, but the request was a translation request
> > >   (EVENT_FLAG_TR set) or the target page was not present
> > >   (EVENT_FLAG_PR unset): call report_iommu_fault(), but the RW
> > >   bit will be invalid, so don't try to map it to a
> > >   IOMMU_FAULT_{READ,WRITE} code
> > 
> > So, why do we need to call report_iommu_fault() for this case?
> > My understanding is we only have IOMMU_FAULT_[READ|WRITE].
> > So, if we can't identify whether the DMA is read / write,
> > we should not need to call report_iommu_fauilt(), is it?
> 
> I don't think that we should just altogether avoid logging the subset
> of page faults for which we can't determine the read/write direction
> on AMD platforms.
> 
> E.g. "access to an unmapped address" (which will have PR=0, and thus we
> won't know if it was a read or a write access) is just as much of a page
> fault as "write to a read-only page" (which will have PR=1, and thus the
> RW bit will be accurate) is, and for RAS purposes, both events are
> equally interesting, and important to know about.
> 
> It's true that we currently don't have a way of signaling to
> report_iommu_fault() (and by extension, to the io_page_fault
> tracepoint) that we're not sure whether the offending access was a read
> or a write, but I think we can just add a bit to include/linux/iommu.h
> to indicate that, something along the lines of:
> 
>/* iommu fault flags */
>#define IOMMU_FAULT_READ0x0
>#define IOMMU_FAULT_WRITE   0x1
>   +#define IOMMU_FAULT_RW_UNKNOWN  0x2
> 
> (Cc'ing Ohad Ben-Cohen, who originally added this API.)
> 
> I don't think that it would be a good idea to just not signal the page
> faults for which we don't know the read/write direction.

I had another look at this, and from some testing, it seems that,
contrary to what the datasheet suggests, the RW (read/write direction)
bit in logged I/O page faults is actually accurate for both faulting
accesses to present pages and faulting accesses to non-present pages.

I made a few hacks to the ixgbe driver to intentionally cause several
different types of I/O page faults, thereby turning an ixgbe NIC into
a poor man's I/O page fault generator, and these are the resulting
logged I/O page faults, on a Ryzen 3700X system:

Read from non-present page:
ixgbe :25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0x1bbc8f040 flags=0x]
=> flags indicate PR(esent)=0, RW=0

Write to non-present page:
ixgbe :25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0x1bdcb70c0 flags=0x0020]
=> flags indicate PR(esent)=0, RW=1

Read from write-only page:
ixgbe :25:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0xbbcc1c40 flags=0x0050]
=> flags indicate PR(esent)=1, RW=0  (and PE(rmission violation)=1)

Write to read-only page:
ixgbe :25:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 
address=0xbdcb70c0 flags=0x0070]
=> flags indicate PR(esent)=1, RW=1  (and PE(rmission violation)=1)

In other words, it seems that the RW bit is reliable even for PR=0 type
faults.

I assume that the restriction mentioned in the docs regarding RW and
PR ("RW is only meaningful when PR=1, TR=0, and I=0" from AMD I/O
Virtualization Technology (IOMMU) Specification, revision 3.00, Table
55: IO_PAGE_FAULT Event Log Buffer Entry Fields) is either confused
or refers to a restriction in older hardware that has since been lifted.

I'll resubmit the patch to unconditionally pass through the RW bit.




ixgbe hacks:

to cause reads from non-present pages (in the TX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8232,7 +8232,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
dma_unmap_len_set(tx_buffer, len, size);
dma_unmap_addr_set(tx_buffer, dma, dma);
 
-   tx_desc->read.buffer_addr = cpu_to_le64(dma);
+   tx_desc->read.buffer_addr = cpu_to_le64(dma + BIT(32));
 
while (unlikely(size > IXGBE_MAX_DATA_PER_TXD)) {
tx_desc->read.cmd_type_len =


to cause writes to non-present pages (in the RX path):

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1604,7 +1604,7 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, 
u16 cleaned_count)
 * Refresh the desc even if buffer_addrs didn't change
 * because each write-back erases this info.
 */
-   rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
+   rx_desc->read.pkt_addr = cpu_to_le64(bi->

[PATCH v4] iommu/amd: Use report_iommu_fault()

2021-09-25 Thread Lennert Buytenhek
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
   I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
   when an I/O page fault occurs.

The latter point is the main aim of this patch, as it allows
rasdaemon-like daemons to be notified of I/O page faults, and to
possibly initiate corrective action in response.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already trigger this
tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR.

The AMD IOMMU specification suggests that the bit in an I/O page fault
event log entry that signals whether an I/O page fault was for a read
request or for a write request is only meaningful when the faulting
access was to a present page, but some testing on a Ryzen 3700X suggests
that this bit encodes the correct value even for I/O page faults to
non-present pages, and therefore, this patch passes the R/W information
up the stack even for I/O page faults to non-present pages.

Signed-off-by: Lennert Buytenhek 
---
Tested on v5.15-rc2 on a Ryzen 3700X, where it has the desired effect.

Changes for v4:
- Unconditionally pass through RW bit, after testing that suggests 
  that this bit is reliable even for I/O page faults to non-present
  pages.

Changes for v3:
- Test fault flags via macros.  (Suggested by Suravee Suthikulpanit.)

Changes for v2:
- Don't call report_iommu_fault() for IRQ remapping faults.
  (Suggested by Joerg Roedel.)

 drivers/iommu/amd/amd_iommu_types.h |  2 ++
 drivers/iommu/amd/iommu.c   | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 8dbe61e2b3c1..867535eb0ce9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,8 @@
 #define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_I   0x008
 
 /* feature control bits */
 #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1722bb161841..7b592aba06f5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -473,6 +473,12 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
pci_dev_put(pdev);
 }
 
+#define IS_IOMMU_MEM_TRANSACTION(flags)\
+   (((flags) & EVENT_FLAG_I) == 0)
+
+#define IS_WRITE_REQUEST(flags)\
+   ((flags) & EVENT_FLAG_RW)
+
 static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
u64 address, int flags)
 {
@@ -484,6 +490,18 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
+   /*
+* If this is a DMA fault (for which the I(nterrupt) bit will
+* be unset), allow report_iommu_fault() to prevent logging it.
+*/
+   if (dev_data && IS_IOMMU_MEM_TRANSACTION(flags)) {
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address,
+   IS_WRITE_REQUEST(flags) ?
+   IOMMU_FAULT_WRITE : IOMMU_FAULT_READ))
+   goto out;
+   }
+
if (dev_data) {
if (__ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
@@ -495,6 +513,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
domain_id, address, flags);
}
 
+out:
if (pdev)
pci_dev_put(pdev);
 }
-- 
2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5] iommu/amd: Use report_iommu_fault()

2021-09-28 Thread Lennert Buytenhek
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
   I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
   when an I/O page fault occurs.

The latter point is the main aim of this patch, as it allows
rasdaemon-like daemons to be notified of I/O page faults, and to
possibly initiate corrective action in response.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already trigger this
tracepoint -- but this isn't yet the case for AMD-Vi and Intel DMAR.

The AMD IOMMU specification suggests that the bit in an I/O page fault
event log entry that signals whether an I/O page fault was for a read
request or for a write request is only meaningful when the faulting
access was to a present page, but some testing on a Ryzen 3700X suggests
that this bit encodes the correct value even for I/O page faults to
non-present pages, and therefore, this patch passes the R/W information
up the stack even for I/O page faults to non-present pages.

Signed-off-by: Lennert Buytenhek 
---
Tested on v5.15-rc3 on a Ryzen 3700X and on a Ryzen 5950X, where it
has the desired effect.

Changes for v5:
- Code formatting tweaking.  (Suggested by Joerg Roedel.)

Changes for v4:
- Unconditionally pass through RW bit, after testing that suggests
  that this bit is reliable even for I/O page faults to non-present
  pages.

Changes for v3:
- Test fault flags via macros.  (Suggested by Suravee Suthikulpanit.)

Changes for v2:
- Don't call report_iommu_fault() for IRQ remapping faults.
  (Suggested by Joerg Roedel.)

 drivers/iommu/amd/amd_iommu_types.h |  2 ++
 drivers/iommu/amd/iommu.c   | 21 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 8dbe61e2b3c1..867535eb0ce9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,8 @@
 #define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_I   0x008
 
 /* feature control bits */
 #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1722bb161841..beadcffcc223 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -473,6 +473,12 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
pci_dev_put(pdev);
 }
 
+#define IS_IOMMU_MEM_TRANSACTION(flags)\
+   (((flags) & EVENT_FLAG_I) == 0)
+
+#define IS_WRITE_REQUEST(flags)\
+   ((flags) & EVENT_FLAG_RW)
+
 static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
u64 address, int flags)
 {
@@ -485,6 +491,20 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
dev_data = dev_iommu_priv_get(>dev);
 
if (dev_data) {
+   /*
+* If this is a DMA fault (for which the I(nterrupt)
+* bit will be unset), allow report_iommu_fault() to
+* prevent logging it.
+*/
+   if (IS_IOMMU_MEM_TRANSACTION(flags)) {
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address,
+   IS_WRITE_REQUEST(flags) ?
+   IOMMU_FAULT_WRITE :
+   IOMMU_FAULT_READ))
+   goto out;
+   }
+
if (__ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
domain_id, address, flags);
@@ -495,6 +515,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
domain_id, address, flags);
}
 
+out:
if (pdev)
pci_dev_put(pdev);
 }
-- 
2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH,RFC] iommu/amd: Recover from event log overflow

2021-09-30 Thread Lennert Buytenhek
On Tue, Sep 28, 2021 at 10:45:30AM +0200, Joerg Roedel wrote:

> Hi Lennert,

Hi Joerg,

Thanks for your feedback!


> > +/*
> > + * This function restarts event logging in case the IOMMU experienced
> > + * an event log buffer overflow.
> > + */
> > +void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
> > +{
> > +   iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
> > +   iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
> > +}
> 
> A few more things need to happen here. First of all head and tail are
> likely equal when the event buffer overflows, so the events will not be
> polled and reported.

I applied the following change on top of my patch, to check the state
of the event log ring buffer when we enter the IRQ handler with an
overflow condition pending:

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -752,6 +752,18 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
struct amd_iommu *iommu = (struct amd_iommu *) data;
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
+   if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) {
+   u32 events;
+
+   events = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET) -
+readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
+   if (events & 0x8000)
+   events += EVT_BUFFER_SIZE;
+   events /= EVENT_ENTRY_SIZE;
+
+   pr_info("Overflow with %d events queued\n", events);
+   }
+
while (status & AMD_IOMMU_INT_MASK) {
/* Enable interrupt sources again */
writel(AMD_IOMMU_INT_MASK,

And that gives me:

[   33.304821] AMD-Vi: Overflow with 511 events queued
[   35.304812] AMD-Vi: Overflow with 511 events queued
[   39.304791] AMD-Vi: Overflow with 511 events queued
[   40.304792] AMD-Vi: Overflow with 511 events queued
[   41.304782] AMD-Vi: Overflow with 511 events queued
[   44.009920] AMD-Vi: Overflow with 511 events queued
[   45.304768] AMD-Vi: Overflow with 511 events queued
[   46.304782] AMD-Vi: Overflow with 511 events queued
[   46.385028] AMD-Vi: Overflow with 511 events queued
[   51.304733] AMD-Vi: Overflow with 511 events queued
[   53.318892] AMD-Vi: Overflow with 511 events queued
[   60.304681] AMD-Vi: Overflow with 511 events queued
[   63.304676] AMD-Vi: Overflow with 511 events queued

In other words, it seems that the hardware considers the event log to
be full when there's still one free entry in the ring buffer, and it
will not actually fill up the last free entry and make the head and
tail pointer equal by doing so.  This seems consistent across Ryzen
3700X, Ryzen 5950X, EPYC 3151, EPYC 3251, RX-421ND, RX-216TD.

The docs talk about "When the Event Log has overflowed", but they don't
define exactly when that happens (i.e. when tail becomes equal to head or
one entry before that), but I did find this phrase that talks about the
overflow condition:

The host software must make space in the event log after an
overflow by reading entries (by adjusting the head pointer) or
by resizing the log.  Event logging may then be restarted.

This seems to suggest that the hardware will never consume the last
entry in the ring buffer and thereby advance tail to be equal to head,
because if it would, then there would be no need for "reading entries
(by adjusting the head pointer)" to make space in the event log buffer,
because the 'empty' and 'full' conditions would be indistinguishable
in that case.

If there _is_ some variation of the hardware out there that does
advance the tail pointer to coincide with the head pointer when there
are already N-1 entries in the log and it has one more entry to write,
then this patch would still work OK on such hardware -- we would just
lose a few more events in that case than we otherwise would, but the
point of the patch is to be able to recover after a burst of I/O page
faults, at which point we've very likely already lost some events, so
I think such hypothetical lossage would be acceptable, given that none
of the hardware I have access to seems to behave that way and from the
wording in the docs it is unlikely to behave that way.

In fact, thinking about it a bit more, by the time an event log
overflow condition is handled, it is actually possible for the event
log ring buffer to be empty and not full.  Imagine this scenario:

- We enter the IRQ handler, EVT status bit is set, the ring buffer is
  full but it hasn't overflowed yet, so OVERFLOW is not set.

- We read interrupt status and clear the EVT bit.

- Before we call iommu_poll_events(), another event comes in, which
  causes the OVERFLOW flag to be set, since we haven't advanced head yet.

- iommu_poll_events() is called, and it processes all events in the
  ring buffer, which is now empty (and will remain empty, since the
  overflow bit is set).

- The MMIO_STATUS_OFFSET re-reading at the end of the IRQ loop finds
  the OVERFLOW bit set and 

[PATCH v2] iommu/amd: Recover from event log overflow

2021-10-04 Thread Lennert Buytenhek
The AMD IOMMU logs I/O page faults and such to a ring buffer in
system memory, and this ring buffer can overflow.  The AMD IOMMU
spec has the following to say about the interrupt status bit that
signals this overflow condition:

EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
event log overflow has occurred. This bit is set when a new
event is to be written to the event log and there is no usable
entry in the event log, causing the new event information to
be discarded. An interrupt is generated when EventOverflow = 1b
and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
entries are written while this bit is set. Software Note: To
resume logging, clear EventOverflow (W1C), and write a 1 to
MMIO Offset 0018h[EventLogEn].

The AMD IOMMU driver doesn't currently implement this recovery
sequence, meaning that if a ring buffer overflow occurs, logging
of EVT/PPR/GA events will cease entirely.

This patch implements the spec-mandated reset sequence, with the
minor tweak that the hardware seems to want to have a 0 written to
MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this
field, or the IOMMU won't actually resume logging events.

Signed-off-by: Lennert Buytenhek 
---
Tested on v5.15-rc4 on a Ryzen 3700X.

Changes for v2:
- Rate limit event log overflow messages.

 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/init.c| 10 ++
 drivers/iommu/amd/iommu.c   | 10 --
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 416815a525d6..bb95edf74415 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -14,6 +14,7 @@
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
 extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 8dbe61e2b3c1..095076029f8d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -110,6 +110,7 @@
 #define PASID_MASK 0x
 
 /* MMIO status bits */
+#define MMIO_STATUS_EVT_OVERFLOW_INT_MASK  (1 << 0)
 #define MMIO_STATUS_EVT_INT_MASK   (1 << 1)
 #define MMIO_STATUS_COM_WAIT_INT_MASK  (1 << 2)
 #define MMIO_STATUS_PPR_INT_MASK   (1 << 6)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 2a822b229bd0..f1c5eb023bda 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -654,6 +654,16 @@ static int __init alloc_command_buffer(struct amd_iommu 
*iommu)
return iommu->cmd_buf ? 0 : -ENOMEM;
 }
 
+/*
+ * This function restarts event logging in case the IOMMU experienced
+ * an event log buffer overflow.
+ */
+void amd_iommu_restart_event_logging(struct amd_iommu *iommu)
+{
+   iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+   iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
+}
+
 /*
  * This function resets the command buffer if the IOMMU stopped fetching
  * commands from it.
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1722bb161841..f46eb7397021 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -742,7 +742,8 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct 
amd_iommu *iommu) { }
 #endif /* !CONFIG_IRQ_REMAP */
 
 #define AMD_IOMMU_INT_MASK \
-   (MMIO_STATUS_EVT_INT_MASK | \
+   (MMIO_STATUS_EVT_OVERFLOW_INT_MASK | \
+MMIO_STATUS_EVT_INT_MASK | \
 MMIO_STATUS_PPR_INT_MASK | \
 MMIO_STATUS_GALOG_INT_MASK)
 
@@ -752,7 +753,7 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
 
while (status & AMD_IOMMU_INT_MASK) {
-   /* Enable EVT and PPR and GA interrupts again */
+   /* Enable interrupt sources again */
writel(AMD_IOMMU_INT_MASK,
iommu->mmio_base + MMIO_STATUS_OFFSET);
 
@@ -773,6 +774,11 @@ irqreturn_t amd_iommu_int_thread(int irq, void *data)
}
 #endif
 
+   if (status & MMIO_STATUS_EVT_OVERFLOW_INT_MASK) {
+   pr_info_ratelimited("IOMMU event log overflow\n");
+   amd_iommu_restart_event_logging(iommu);
+   }
+
/*
 * Hardware bug: ERBT1312
 * When re-enabling interrupt (by writing 1
-- 
2.31.1
___
iommu mailing list
iommu@lists.lin

[PATCH] iommu/amd: Fix printing of IOMMU events when rate limiting kicks in

2021-07-21 Thread Lennert Buytenhek
For the printing of RMP_HW_ERROR / RMP_PAGE_FAULT / IO_PAGE_FAULT
events, the AMD IOMMU code uses such logic:

if (pdev)
dev_data = dev_iommu_priv_get(>dev);

if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, ...
} else {
printk_ratelimit() / pr_err{,_ratelimited}(...
}

This means that if we receive an event for a PCI devid which actually
does have a struct pci_dev and an attached struct iommu_dev_data, but
rate limiting kicks in, we'll fall back to the non-PCI branch of the
test, and print the event in a different format.

Fix this by changing the logic to:

if (dev_data) {
if (__ratelimit(_data->rs)) {
pci_err(pdev, ...
}
} else {
pr_err_ratelimited(...
}

Suggested-by: Suravee Suthikulpanit 
Signed-off-by: Lennert Buytenhek 
---
 drivers/iommu/amd/iommu.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..a7d6d78147b7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -425,9 +425,11 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
-   if (dev_data && __ratelimit(_data->rs)) {
-   pci_err(pdev, "Event logged [RMP_HW_ERROR vmg_tag=0x%04x, 
spa=0x%llx, flags=0x%04x]\n",
-   vmg_tag, spa, flags);
+   if (dev_data) {
+   if (__ratelimit(_data->rs)) {
+   pci_err(pdev, "Event logged [RMP_HW_ERROR 
vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   vmg_tag, spa, flags);
+   }
} else {
pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
@@ -456,9 +458,11 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
-   if (dev_data && __ratelimit(_data->rs)) {
-   pci_err(pdev, "Event logged [RMP_PAGE_FAULT vmg_tag=0x%04x, 
gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
-   vmg_tag, gpa, flags_rmp, flags);
+   if (dev_data) {
+   if (__ratelimit(_data->rs)) {
+   pci_err(pdev, "Event logged [RMP_PAGE_FAULT 
vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
+   vmg_tag, gpa, flags_rmp, flags);
+   }
} else {
pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
@@ -480,11 +484,13 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
-   if (dev_data && __ratelimit(_data->rs)) {
-   pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
-   domain_id, address, flags);
-   } else if (printk_ratelimit()) {
-   pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   if (dev_data) {
+   if (__ratelimit(_data->rs)) {
+   pci_err(pdev, "Event logged [IO_PAGE_FAULT 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   domain_id, address, flags);
+   }
+   } else {
+   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}
-- 
2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix I/O page fault logging ratelimit test

2021-07-21 Thread Lennert Buytenhek
On Tue, Jul 20, 2021 at 07:05:50PM -0500, Suthikulpanit, Suravee wrote:

> Hi Lennert,

Hi Suravee,


> > On an AMD system, I/O page faults are usually logged like this:
> > 
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index 811a49a95d04..7ae426b092f2 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
> > domain_id,
> > if (dev_data && __ratelimit(_data->rs)) {
> > pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
> > address=0x%llx flags=0x%04x]\n",
> > domain_id, address, flags);
> > -   } else if (printk_ratelimit()) {
> > +   } else if (!dev_data && printk_ratelimit()) {
> 
> This seems a bit confusing. Also, according to the following comment in 
> include/linux/printk.h:
> 
> /*
>  * Please don't use printk_ratelimit(), because it shares ratelimiting state
>  * with all other unrelated printk_ratelimit() callsites.  Instead use
>  * printk_ratelimited() or plain old __ratelimit().
>  */
> 
> We probably should move away from using printk_ratelimit() here.
> What about the following change instead?
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 811a49a95d04..8eb5d3519743 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -480,11 +480,12 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
> domain_id,
> if (pdev)
> dev_data = dev_iommu_priv_get(>dev);
> 
> -   if (dev_data && __ratelimit(_data->rs)) {
> -   pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
> address=0x%llx flags=0x%04x]\n",
> -   domain_id, address, flags);
> -   } else if (printk_ratelimit()) {
> -   pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
> domain=0x%04x address=0x%llx flags=0x%04x]\n",
> +   if (dev_data) {
> +   if (__ratelimit(_data->rs))
> +   pci_err(pdev, "Event logged [IO_PAGE_FAULT 
> domain=0x%04x address=0x%llx flags=0x%04x]\n",
> +   domain_id, address, flags);
> +   } else {
> +   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
> device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
> PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
> domain_id, address, flags);
> }

Looks good!


> Note also that there might be other places in this file that would need
> similar modification as well.

Indeed, there are two more sites like these.

I've sent a new patch that incorporates your feedback.  Thank you!


Cheers,
Lennert
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Fix I/O page fault logging ratelimit test

2021-07-18 Thread Lennert Buytenhek
On an AMD system, I/O page faults are usually logged like this:

drvname :05:00.0: Event logged [IO_PAGE_FAULT domain=0x 
address=0x92050da0 flags=0x0020]

But sometimes they are logged like this instead, even for the exact
same PCI device:

AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x 
address=0x92050de0 flags=0x0020]

This discrepancy appears to be caused by this code:

if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
domain_id, address, flags);
} else if (printk_ratelimit()) {
pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}

If an I/O page fault occurs for a PCI device with associated
iommu_dev_data, but for which the __ratelimit(_data->rs) check fails,
we'll give it a second chance with printk_ratelimit(), and if that check
succeeds, we will log the fault anyway, but in a different format.

Change this to only check printk_ratelimit() if !dev_data, which seems to
be what had been originally intended.

Signed-off-by: Lennert Buytenhek 
---
 drivers/iommu/amd/iommu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..7ae426b092f2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -483,7 +483,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
domain_id, address, flags);
-   } else if (printk_ratelimit()) {
+   } else if (!dev_data && printk_ratelimit()) {
pr_err("Event logged [IO_PAGE_FAULT device=%02x:%02x.%x 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH,RFC] iommu/amd: Use report_iommu_fault()

2021-07-19 Thread Lennert Buytenhek
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
   I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
   when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

   irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
   irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
   irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
   irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
   irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
   irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.

Signed-off-by: Lennert Buytenhek 
---
 drivers/iommu/amd/amd_iommu_types.h |4 
 drivers/iommu/amd/iommu.c   |   25 +
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
 #define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
 
 /* feature control bits */
 #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..a02ace7ee794 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
+   if (dev_data) {
+   int report_flags;
+
+   /*
+* AMD I/O Virtualization Technology (IOMMU) Specification,
+* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+* that the RW ("read-write") bit is only valid if the I/O
+* page fault was caused by a memory transaction request
+* referencing a page that was marked present.
+*/
+   report_flags = 0;
+   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) ==
+   EVENT_FLAG_PR) {
+   if (flags & EVENT_FLAG_RW)
+   report_flags |= IOMMU_FAULT_WRITE;
+   else
+   report_flags |= IOMMU_FAULT_READ;
+   }
+
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address, report_flags))
+   goto out;
+   }
+
if (dev_data && __ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x 
address=0x%llx flags=0x%04x]\n",
domain_id, address, flags);
@@ -489,6 +513,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
domain_id, address, flags);
}
 
+out:
if (pdev)
pci_dev_put(pdev);
 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()

2021-07-22 Thread Lennert Buytenhek
On Thu, Jul 22, 2021 at 02:26:08PM -0500, Suthikulpanit, Suravee wrote:

> Lennert,

Hi Suravee,


> > This patch makes iommu/amd call report_iommu_fault() when an I/O page
> > fault occurs, which has two effects:
> > 
> > 1) It allows device drivers to register a callback to be notified of
> > I/O page faults, via the iommu_set_fault_handler() API.
> > 
> > 2) It triggers the io_page_fault tracepoint in report_iommu_fault()
> > when an I/O page fault occurs.
> > 
> > I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
> > functionality for handling platform errors, and being able to be notified
> > of I/O page faults for initiating corrective action is very useful -- and
> > receiving such events via event tracing is a lot nicer than having to
> > scrape them from kmsg.
> 
> Interesting. Just curious what types of error handling are done here?

For example, this daemon annotates PCI errors with the symbolic name of
the PCI device (including line card and ASIC number) that caused the
fault, which is useful when there are dozens of identical ASICs in a
system, and when hotplug makes it so that the offending PCI device might
not be in the system anymore by the time someone gets around to looking
at the fault, or a different line card may have been inserted in its place.


> > A number of other IOMMU drivers already use report_iommu_fault(), and
> > I/O page faults on those IOMMUs therefore already seem to trigger this
> > tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.
> > 
> > I copied the logic from the other callers of report_iommu_fault(), where
> > if that function returns zero, the driver will have handled the fault,
> > in which case we avoid logging information about the fault to the printk
> > buffer from the IOMMU driver.
> > 
> > With this patch I see io_page_fault event tracing entries as expected:
> > 
> > irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482640 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482650 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482660 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482670 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x91482680 flags=0x
> > irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: 
> > IOMMU:[drvname] :05:00.0 iova=0x914826a0 flags=0x
> > 
> > For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
> > spec, but I haven't tested that bit of the code, as the page faults I
> > encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
> > case EVENT_FLAG_RW doesn't make sense.
> 
> Since, IO_PAGE_FAULT event is used to communicate various types of
> fault events, why don't we just pass the flags as-is? This way, it can
> be used to report/trace various types of IO_PAGE_FAULT events (e.g.
> for I/O page table, interrupt remapping, and etc).
> 
> Interested parties can register domain fault handler, and it can takes
> care of parsing information of the flag as needed.
> 
> > Signed-off-by: Lennert Buytenhek 
> > ---
> >   drivers/iommu/amd/amd_iommu_types.h |4 
> >   drivers/iommu/amd/iommu.c   |   25 +
> >   2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/amd/amd_iommu_types.h 
> > b/drivers/iommu/amd/amd_iommu_types.h
> > index 94c1a7a9876d..2f2c6630c24c 100644
> > --- a/drivers/iommu/amd/amd_iommu_types.h
> > +++ b/drivers/iommu/amd/amd_iommu_types.h
> > @@ -138,6 +138,10 @@
> >   #define EVENT_DOMID_MASK_HI   0xf
> >   #define EVENT_FLAGS_MASK  0xfff
> >   #define EVENT_FLAGS_SHIFT 0x10
> > +#define EVENT_FLAG_TR  0x100
> > +#define EVENT_FLAG_RW  0x020
> > +#define EVENT_FLAG_PR  0x010
> > +#define EVENT_FLAG_I   0x008
> >   /* feature control bits */
> >   #define CONTROL_IOMMU_EN0x00ULL
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index 811a49a95d04..a02ace7ee794 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -480,6 +480,30 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
> > domain_id,
>

Re: [PATCH,RFC] iommu/amd: Use report_iommu_fault()

2021-07-26 Thread Lennert Buytenhek
On Mon, Jul 26, 2021 at 01:54:49PM +0200, Joerg Roedel wrote:

> Hi Lennert,

Hi Joerg,


> On Mon, Jul 19, 2021 at 12:54:43PM +0300, Lennert Buytenhek wrote:
> > +   if (dev_data) {
> > +   int report_flags;
> > +
> > +   /*
> > +* AMD I/O Virtualization Technology (IOMMU) Specification,
> > +* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
> > +* that the RW ("read-write") bit is only valid if the I/O
> > +* page fault was caused by a memory transaction request
> > +* referencing a page that was marked present.
> > +*/
> > +   report_flags = 0;
> > +   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)) ==
> > +   EVENT_FLAG_PR) {
> > +   if (flags & EVENT_FLAG_RW)
> > +   report_flags |= IOMMU_FAULT_WRITE;
> > +   else
> > +   report_flags |= IOMMU_FAULT_READ;
> > +   }
> > +
> > +   if (!report_iommu_fault(_data->domain->domain,
> > +   >dev, address, report_flags))
> > +   goto out;
> > +   }
> 
> I'd like to limit calling report_iommu_fault() to dma-faults and leave
> IRQ remapping faults unreported. The IOMMU layer does not really care a
> lot about IRQs and a potential domain handler will also not be prepared
> to handler IRQ specific faults (there is no generic way to detect them).

I'm sorry -- this appears to have been a stupid oversight on my
part.  New patch coming up.


Thanks,
Lennert
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/amd: Use report_iommu_fault()

2021-07-26 Thread Lennert Buytenhek
This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
   I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
   when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

   irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
   irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
   irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
   irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
   irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
   irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.

Signed-off-by: Lennert Buytenhek 
---
Changes since v1 RFC:

- Don't call report_iommu_fault() for IRQ remapping faults.
  (Suggested by Joerg Roedel.)

 drivers/iommu/amd/amd_iommu_types.h |  4 
 drivers/iommu/amd/iommu.c   | 29 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
 #define EVENT_DOMID_MASK_HI0xf
 #define EVENT_FLAGS_MASK   0xfff
 #define EVENT_FLAGS_SHIFT  0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
 
 /* feature control bits */
 #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a7d6d78147b7..d9fb2c22d44a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
 
+   /*
+* If this is a DMA fault (for which the I(nterrupt) bit will
+* be unset), allow report_iommu_fault() to prevent logging it.
+*/
+   if (dev_data && ((flags & EVENT_FLAG_I) == 0)) {
+   int report_flags;
+
+   /*
+* AMD I/O Virtualization Technology (IOMMU) Specification,
+* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+* that the RW ("read-write") bit is only valid if the I/O
+* page fault was caused by a memory transaction request
+* referencing a page that was marked present.
+*/
+   report_flags = 0;
+   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR)) ==
+   EVENT_FLAG_PR) {
+   if (flags & EVENT_FLAG_RW)
+   report_flags |= IOMMU_FAULT_WRITE;
+   else
+   report_flags |= IOMMU_FAULT_READ;
+   }
+
+   if (!report_iommu_fault(_data->domain->domain,
+   >dev, address, report_flags))
+   goto out;
+   }
+
if (dev_data) {
if (__ratelimit(_data->rs)) {
pci_err(pdev, "Event logged [IO_PAGE_FAULT 
domain=0x%04x address=0x%llx flags=0x%04x]\n",
@@ -495,6 +523,7 @@ static void amd_iommu_report_page_fault(u16 devid,