RE: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

2021-01-17 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Saturday, January 16, 2021 11:54 AM
> 
> Hi Jean,
> 
> On 2021/1/15 0:41, Jean-Philippe Brucker wrote:
> > I guess detailing what's needed for nested IOPF can help the discussion,
> > although I haven't seen any concrete plan about implementing it, and it
> > still seems a couple of years away. There are two important steps with
> > nested IOPF:
> >
> > (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event
> >  comes with this information, but a PRI page request doesn't. The
> IOMMU
> >  driver has to first translate the IOVA to a GPA, injecting the fault
> >  into the guest if this translation fails by using the usual
> >  iommu_report_device_fault().

The IOMMU driver can walk the page tables to find out the level information.
If the walk terminates at the 1st level, inject to the guest. Otherwise fix the 
mm fault at 2nd level. It's not efficient compared to hardware-provided info,
but it's doable and actual overhead needs to be measured (optimization exists
e.g. having fault client to hint no 2nd level fault expected when registering 
fault
handler in pinned case).

> >
> > (2) Translating the faulting GPA to a HVA that can be fed to
> >  handle_mm_fault(). That requires help from KVM, so another interface -
> >  either KVM registering GPA->HVA translation tables or IOMMU driver
> >  querying each translation. Either way it should be reusable by device
> >  drivers that implement IOPF themselves.

Or just leave to the fault client (say VFIO here) to figure it out. VFIO has the
information about GPA->HPA and can then call handle_mm_fault to fix the
received fault. The IOMMU driver just exports an interface for the device 
drivers 
which implement IOPF themselves to report a fault which is then handled by
the IOMMU core by reusing the same faulting path.

> >
> > (1) could be enabled with iommu_dev_enable_feature(). (2) requires a
> more
> > complex interface. (2) alone might also be desirable - demand-paging for
> > level 2 only, no SVA for level 1.

Yes, this is what we want to point out. A general FEAT_IOPF implies more than
what this patch intended to address.

> >
> > Anyway, back to this patch. What I'm trying to convey is "can the IOMMU
> > receive incoming I/O page faults for this device and, when SVA is enabled,
> > feed them to the mm subsystem?  Enable that or return an error." I'm stuck
> > on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1
> > is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2)
> > above. IOMMU_DEV_FEAT_IOPF_FLAT?
> >
> > That leaves space for the nested extensions. (1) above could be
> > IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with
> KVM (or
> > just an external fault handler) and could be used with either IOPF_FLAT or
> > IOPF_NESTED. We can figure out the details later. What do you think?
> 
> I agree that we can define IOPF_ for current usage and leave space for
> future extensions.
> 
> IOPF_FLAT represents IOPF on first-level translation, currently first
> level translation could be used in below cases.
> 
> 1) FL w/ internal Page Table: Kernel IOVA;
> 2) FL w/ external Page Table: VFIO passthrough;
> 3) FL w/ shared CPU page table: SVA
> 
> We don't need to support IOPF for case 1). Let's put it aside.
> 
> IOPF handling of 2) and 3) are different. Do we need to define different
> names to distinguish these two cases?
> 

Defining feature names according to various use cases does not sound a
clean way. In an ideal way we should have just a general FEAT_IOPF since
the hardware (at least VT-d) does support fault in either 1st-level, 2nd-
level or nested configurations. We are entering this trouble just because
there is difficulty for the software evolving to enable full hardware cap
in one batch. My last proposal was sort of keeping FEAT_IOPF as a general
capability for whether delivering fault through the IOMMU or the ad-hoc
device, and then having a separate interface for whether IOPF reporting
is available under a specific configuration. The former is about the path
between the IOMMU and the device, while the latter is about the interface
between the IOMMU driver and its faulting client.

The reporting capability can be checked when the fault client is registering 
its fault handler, and at this time the IOMMU driver knows how the related 
mapping is configured (1st, 2nd, or nested) and whether fault reporting is 
supported in such configuration. We may introduce IOPF_REPORT_FLAT and 
IOPF_REPORT_NESTED respectively. while IOPF_REPORT_FLAT detection is 
straightforward (2 and 3 can be differentiated internally based on configured 
level), IOPF_REPORT_NESTED needs additional info to indicate which level is 
concerned since the vendor driver may not support fault reporting in both 
levels or the fault client may be interested in only one level (e.g. with 2nd
level pinned).

Thanks
Kevin

[PATCH] iommu/amd: Make use of EFR from IVHD when available

2021-01-17 Thread Suravee Suthikulpanit
IOMMU Extended Feature Register (EFR) is used to communicate
the supported features for each IOMMU to the IOMMU driver.
This is normally read from the PCI MMIO register offset 0x30,
and used by the iommu_feature() helper function.

However, there are certain scenarios where the information is needed
prior to PCI initialization, and the iommu_feature() function is used
prematurely w/o warning. This has caused incorrect initialization of IOMMU.

The EFR is also available in the IVHD header, and is available to
the driver prior to PCI initialization. Therefore, default to using
the IVHD EFR instead.

Fixes: 6d39bdee238f ("iommu/amd: Enforce 4k mapping for certain IOMMU data 
structures")
Tested-by: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  3 ++-
 drivers/iommu/amd/amd_iommu_types.h |  4 +++
 drivers/iommu/amd/init.c| 39 +++--
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 6b8cbdf71714..0a89e9c4f7b3 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -86,7 +86,8 @@ static inline bool is_rd890_iommu(struct pci_dev *pdev)
 
 static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
 {
-   if (!(iommu->cap & (1 << IOMMU_CAP_EFR)))
+   /* features == 0 means EFR is not supported */
+   if (!iommu->features)
return false;
 
return !!(iommu->features & f);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 553587827771..35331e458dd1 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -387,6 +387,10 @@
 #define IOMMU_CAP_NPCACHE 26
 #define IOMMU_CAP_EFR 27
 
+/* IOMMU IVINFO */
+#define IOMMU_IVINFO_OFFSET  36
+#define IOMMU_IVINFO_EFRSUP_SHIFT0
+
 /* IOMMU Feature Reporting Field (for IVHD type 10h */
 #define IOMMU_FEAT_GASUP_SHIFT 6
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6a1f7048dacc..28b1d2feec96 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -257,6 +257,8 @@ static void init_device_table_dma(void);
 
 static bool amd_iommu_pre_enabled = true;
 
+static u32 amd_iommu_ivinfo;
+
 bool translation_pre_enabled(struct amd_iommu *iommu)
 {
return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
@@ -1577,6 +1579,14 @@ static int __init init_iommu_one(struct amd_iommu 
*iommu, struct ivhd_header *h)
 
if (h->efr_reg & BIT(IOMMU_EFR_XTSUP_SHIFT))
amd_iommu_xt_mode = IRQ_REMAP_X2APIC_MODE;
+
+   /*
+* For IVHD type 0x11/0x40, EFR is also available via IVHD.
+* Default to IVHD EFR since it is available sooner
+* (i.e. before PCI init).
+*/
+   if (amd_iommu_ivinfo & (1 << IOMMU_IVINFO_EFRSUP_SHIFT))
+   iommu->features = h->efr_reg;
break;
default:
return -EINVAL;
@@ -1770,6 +1780,29 @@ static const struct attribute_group *amd_iommu_groups[] 
= {
NULL,
 };
 
+/*
+ * Note: IVHD 0x11 and 0x40 also contains exact copy
+ * of the IOMMU Extended Feature Register [MMIO Offset 0030h].
+ * Default to EFR in IVHD since it is available sooner (i.e. before PCI init).
+ * However, sanity check and warn if they conflict.
+ */
+static void __init iommu_init_features(struct amd_iommu *iommu)
+{
+   u64 features;
+
+   if (!(iommu->cap & (1 << IOMMU_CAP_EFR)))
+   return;
+
+   /* read extended feature bits */
+   features = readq(iommu->mmio_base + MMIO_EXT_FEATURES);
+
+   if (iommu->features && (features != iommu->features))
+   pr_err(FW_BUG "EFR mismatch. Use IVHD EFR (%#llx : %#llx\n).",
+  features, iommu->features);
+   else
+   iommu->features = features;
+}
+
 static int __init iommu_init_pci(struct amd_iommu *iommu)
 {
int cap_ptr = iommu->cap_ptr;
@@ -1789,8 +1822,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
if (!(iommu->cap & (1 << IOMMU_CAP_IOTLB)))
amd_iommu_iotlb_sup = false;
 
-   /* read extended feature bits */
-   iommu->features = readq(iommu->mmio_base + MMIO_EXT_FEATURES);
+   iommu_init_features(iommu);
 
if (iommu_feature(iommu, FEATURE_GT)) {
int glxval;
@@ -2661,6 +2693,9 @@ static int __init early_amd_iommu_init(void)
if (ret)
goto out;
 
+   /* Store IVRS IVinfo field. */
+   amd_iommu_ivinfo = *((u32 *)((u8 *)ivrs_base + IOMMU_IVINFO_OFFSET));
+
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org