Re: [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
Hi Eric, On 5/16/19 6:08 PM, Eric Auger wrote: Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory region type, let's report USB and GFX RMRRs as relaxable ones. This allows to have a finer reporting at IOMMU API level of reserved memory regions. This will be exploitable by VFIO to define the usable IOVA range and detect potential conflicts between the guest physical address space and host reserved regions. Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a36604f4900f..af1d65fdedfc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct device *device, for_each_rmrr_units(rmrr) { for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, i_dev) { + struct pci_dev *pdev = to_pci_dev(device); Probably should be: struct pci_dev *pdev = dev_is_pci(device) ? to_pci_dev(device) : NULL; Best regards, Lu Baolu
[PATCH 3/3] iommu/amd: only free resources once on init error
When amd_iommu=off was specified on the command line, free_X_resources functions were called immediately after early_amd_iommu_init. They were then called again when amd_iommu_init also failed (as expected). Instead, call them only once: at the end of state_next() whenever there's an error. These functions should be safe to call any time and any number of times. However, since state_next is never called again in an error state, the cleanup will only ever be run once. This also ensures that cleanup code is run as soon as possible after an error is detected rather than waiting for amd_iommu_init() to be called. Signed-off-by: Kevin Mitchell --- drivers/iommu/amd_iommu_init.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 5f3df5ae6ba8..24fc060fe596 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2638,8 +2638,6 @@ static int __init state_next(void) init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED; if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) { pr_info("AMD IOMMU disabled on kernel command-line\n"); - free_dma_resources(); - free_iommu_resources(); init_state = IOMMU_CMDLINE_DISABLED; ret = -EINVAL; } @@ -2680,6 +2678,19 @@ static int __init state_next(void) BUG(); } + if (ret) { + free_dma_resources(); + if (!irq_remapping_enabled) { + disable_iommus(); + free_iommu_resources(); + } else { + struct amd_iommu *iommu; + + uninit_device_table_dma(); + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + } + } return ret; } @@ -2753,18 +2764,6 @@ static int __init amd_iommu_init(void) int ret; ret = iommu_go_to_state(IOMMU_INITIALIZED); - if (ret) { - free_dma_resources(); - if (!irq_remapping_enabled) { - disable_iommus(); - free_iommu_resources(); - } else { - uninit_device_table_dma(); - for_each_iommu(iommu) - iommu_flush_all_caches(iommu); - } - } - #ifdef CONFIG_GART_IOMMU if (ret && list_empty(_iommu_list)) { /* -- 2.20.1
[PATCH 2/3] iommu/amd: move gart fallback to amd_iommu_init
The fallback to the GART driver in the case amd_iommu doesn't work was executed in a function called free_iommu_resources, which didn't really make sense. This was even being called twice if amd_iommu=off was specified on the command line. The only complication is that it needs to be verified that amd_iommu has fully relinquished control by calling free_iommu_resources and emptying the amd_iommu_list. Signed-off-by: Kevin Mitchell --- drivers/iommu/amd_iommu_init.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 3798d7303c99..5f3df5ae6ba8 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -2345,15 +2345,6 @@ static void __init free_iommu_resources(void) amd_iommu_dev_table = NULL; free_iommu_all(); - -#ifdef CONFIG_GART_IOMMU - /* -* We failed to initialize the AMD IOMMU - try fallback to GART -* if possible. -*/ - gart_iommu_init(); - -#endif } /* SB IOAPIC is always on this device in AMD systems */ @@ -2774,6 +2765,16 @@ static int __init amd_iommu_init(void) } } +#ifdef CONFIG_GART_IOMMU + if (ret && list_empty(_iommu_list)) { + /* +* We failed to initialize the AMD IOMMU - try fallback +* to GART if possible. +*/ + gart_iommu_init(); + } +#endif + for_each_iommu(iommu) amd_iommu_debugfs_setup(iommu); -- 2.20.1
[PATCH 1/3] iommu/amd: make iommu_disable safer
Make it safe to call iommu_disable during early init error conditions before mmio_base is set, but after the struct amd_iommu has been added to the amd_iommu_list. For example, this happens if firmware fails to fill in mmio_phys in the ACPI table leading to a NULL pointer dereference in iommu_feature_disable. Signed-off-by: Kevin Mitchell --- drivers/iommu/amd_iommu_init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index f773792d77fd..3798d7303c99 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -424,6 +424,9 @@ static void iommu_enable(struct amd_iommu *iommu) static void iommu_disable(struct amd_iommu *iommu) { + if (!iommu->mmio_base) + return; + /* Disable command buffer */ iommu_feature_disable(iommu, CONTROL_CMDBUF_EN); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/3] handle init errors more gracefully in amd_iommu
This series makes error handling more robust in the amd_iommu init code. It was initially motivated by problematic firmware that does not set up the physical address of the iommu. This led to a NULL dereference panic when iommu_disable was called during cleanup. While the first patch is sufficient to avoid the panic, the subsequent two move the cleanup closer to the actual error and avoid calling the cleanup code twice when amd_iommu=off is specified on the command line. I have tested this series on a variety of AMD CPUs with firmware exhibiting the issue. I have additionally tested on platforms where the firmware has been fixed. I tried both with and without amd_iommu=off. I have also tested on older CPUs where no IOMMU is detected and even one where the GART driver ends up running. Thanks, Kevin Kevin Mitchell (3): iommu/amd: make iommu_disable safer iommu/amd: move gart fallback to amd_iommu_init iommu/amd: only free resources once on init error drivers/iommu/amd_iommu_init.c | 45 ++ 1 file changed, 24 insertions(+), 21 deletions(-) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] vfio: vfio_iommu_type1: Define VFIO_IOMMU_INFO_CAPABILITIES
On Thu, 16 May 2019 16:57:42 +0200 Christian Borntraeger wrote: > Alex, > > patch 1 and 3 are s390 specific, 2 and 4 are vfio common code. > Are you ok with the common code changes? If yes, would you prefer to have this > via the s390 tree (Martin) or your tree? Hi Christian, The vfio code still needs work imo, and I'm not sure it isn't somewhat abusive of the iommu attribute interface as well. I don't necessarily have a problem with it ultimately going through the s390 tree, but let's see what comes in the next revision. Thanks, Alex > On 10.05.19 10:22, Pierre Morel wrote: > > To use the VFIO_IOMMU_GET_INFO to retrieve IOMMU specific information, > > we define a new flag VFIO_IOMMU_INFO_CAPABILITIES in the > > vfio_iommu_type1_info structure and the associated capability > > information block. > > > > Signed-off-by: Pierre Morel > > --- > > include/uapi/linux/vfio.h | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 8f10748..8f68e0f 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -715,6 +715,16 @@ struct vfio_iommu_type1_info { > > __u32 flags; > > #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ > > __u64 iova_pgsizes; /* Bitmap of supported page sizes */ > > +#define VFIO_IOMMU_INFO_CAPABILITIES (1 << 1) /* support capabilities > > info */ > > + __u64 cap_offset; /* Offset within info struct of first cap */ > > +}; > > + > > +#define VFIO_IOMMU_INFO_CAP_QFN1 > > +#define VFIO_IOMMU_INFO_CAP_QGRP 2 > > + > > +struct vfio_iommu_type1_info_block { > > + struct vfio_info_cap_header header; > > + __u32 data[]; > > }; > > > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) > > >
Re: [PATCH 2/4] vfio: vfio_iommu_type1: Define VFIO_IOMMU_INFO_CAPABILITIES
On Fri, 10 May 2019 10:22:33 +0200 Pierre Morel wrote: > To use the VFIO_IOMMU_GET_INFO to retrieve IOMMU specific information, > we define a new flag VFIO_IOMMU_INFO_CAPABILITIES in the > vfio_iommu_type1_info structure and the associated capability > information block. > > Signed-off-by: Pierre Morel > --- > include/uapi/linux/vfio.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 8f10748..8f68e0f 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -715,6 +715,16 @@ struct vfio_iommu_type1_info { > __u32 flags; > #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ > __u64 iova_pgsizes; /* Bitmap of supported page sizes */ > +#define VFIO_IOMMU_INFO_CAPABILITIES (1 << 1) /* support capabilities info > */ > + __u64 cap_offset; /* Offset within info struct of first cap */ > +}; > + > +#define VFIO_IOMMU_INFO_CAP_QFN 1 > +#define VFIO_IOMMU_INFO_CAP_QGRP 2 Descriptions? > + > +struct vfio_iommu_type1_info_block { > + struct vfio_info_cap_header header; > + __u32 data[]; > }; > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) This is just a blob of data, what's the API? How do we revision it? How does the user know how to interpret it? Dumping kernel internal structures out to userspace like this is not acceptable, define a user API. Thanks, Alex
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On 16/05/2019 14:23, Auger Eric wrote: Hi Robin, On 5/16/19 2:46 PM, Robin Murphy wrote: On 16/05/2019 11:08, Eric Auger wrote: Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger --- Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. Hmm, that sounds a bit off - if we have overlapping regions within the same domain, then we need to do some additional pre-processing to adjust them anyway, since any part of a relaxable region which overlaps a non-relaxable region cannot actually be relaxed, and so really should never be described as such. In iommu_insert_resv_region(), we are overlapping regions of the same type. So iommu_get_group_resv_regions() should return both the relaxable region and non relaxable one. I should test this again using a hacked kernel though. We should still consider relaxable regions as being able to merge back in to regular direct regions to a degree - If a relaxable region falls entirely within a direct one then there's no point exposing it because the direct region *has* to take precedence and be enforced. If there is an incomplete overlap then we could possibly just trust consumers to see it and give the direct region precedence themselves, but since the relaxable region is our own in-kernel invention rather than firmware gospel I think it would be safer to truncate it to just its non-overlapping part. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On 16/05/2019 18:06, Alex Williamson wrote: On Thu, 16 May 2019 14:58:08 +0200 Auger Eric wrote: Hi Jean-Philippe, On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote: On 16/05/2019 12:45, Auger Eric wrote: Hi Jean-Philippe, On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote: On 16/05/2019 11:08, Eric Auger wrote: Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. This would unblock Shameer's series: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ Thanks for doing this! which failed to get pulled for 4.18 merge window due to IGD device assignment regression. v2 -> v3: - fix direct type check --- drivers/iommu/iommu.c | 12 +++- include/linux/iommu.h | 6 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ae4ea5c0e6f9..28c3d6351832 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -73,10 +73,11 @@ struct iommu_group_attribute { }; static const char * const iommu_group_resv_type_string[] = { - [IOMMU_RESV_DIRECT] = "direct", - [IOMMU_RESV_RESERVED] = "reserved", - [IOMMU_RESV_MSI]= "msi", - [IOMMU_RESV_SW_MSI] = "msi", + [IOMMU_RESV_DIRECT] = "direct", + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", + [IOMMU_RESV_RESERVED] = "reserved", + [IOMMU_RESV_MSI]= "msi", + [IOMMU_RESV_SW_MSI] = "msi", }; #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, start = ALIGN(entry->start, pg_size); end = ALIGN(entry->start + entry->length, pg_size); - if (entry->type != IOMMU_RESV_DIRECT) + if (entry->type != IOMMU_RESV_DIRECT && + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) I'm trying to understand why you need to create direct mappings at all for these relaxable regions. In the host the region is needed for legacy device features, which are disabled (and cannot be re-enabled) when assigning the device to a guest? This follows Kevin's comment in the thread below: https://patchwork.kernel.org/patch/10449103/#21957279 In normal DMA API host path, those regions need to be 1-1 mapped. They are likely to be accessed by the driver or FW at early boot phase or even during execution, depending on features being used. That's the reason, according to Kevin we couldn't hide them. We just know that, in general, they are not used anymore when assigning the device or if accesses are attempted this generally does not block the assignment use case. For example, it is said in https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in legacy IGD assignment use case, there may be "a small numbers of DMAR faults when initially assigned". Hmm, fair enough. That doesn't sound too good, if the device might perform arbitrary writes into guest memory once new IOMMU mappings are in place. I was wondering if we could report some IOVA ranges as "available but avoid if possible". In Shameer's series we currently reject any vfio dma_map that would fall into an RMRR (hence the regression on existing USB/GFX use case). With the relaxable RMRR info we could imagine to let the userspace choose whether we want to proceed with the dma_map despite the risk or introduce a vfio_iommu_type1 module option (turned off by default for not regressing existing USB/GFX passthrough) that would forbid dma_map on relaxable RMRR regions. Yep, the risk that Jean-Philippe mentions is real, the IGD device has the stolen memory addresses latched into the hardware and we're unable to change that. What we try to do now is trap page table writes to the device and translate them to a VM allocated stolen memory range, which is sufficient for getting a BIOS splash screen, but we really want to assume that the OS level driver just doesn't use the stolen memory range. There was a time when it seemed like we could assume the Intel drivers were heading in that direction, but it seems that's no longer an actual goal. To fully support IGD assignment in a way that isn't as fragile as it is today, we'd want to re-export the RMRR out to userspace so that QEMU could identity map it into the VM address space. That's not trivial, it's only one of several issues around IGD assignment, and we've got GVT-g (Intel vGPUs) now that don't impose these requirements, so motivation to tackle the issue is somewhat reduced. With the changes here, we might want vfio to
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On Thu, 16 May 2019 14:58:08 +0200 Auger Eric wrote: > Hi Jean-Philippe, > > On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote: > > On 16/05/2019 12:45, Auger Eric wrote: > >> Hi Jean-Philippe, > >> > >> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote: > >>> On 16/05/2019 11:08, Eric Auger wrote: > Note: At the moment the sysfs ABI is not changed. However I wonder > whether it wouldn't be preferable to report the direct region as > "direct_relaxed" there. At the moment, in case the same direct > region is used by 2 devices, one USB/GFX and another not belonging > to the previous categories, the direct region will be output twice > with "direct" type. > > This would unblock Shameer's series: > [PATCH v6 0/7] vfio/type1: Add support for valid iova list management > https://patchwork.kernel.org/patch/10425309/ > >>> > >>> Thanks for doing this! > >>> > which failed to get pulled for 4.18 merge window due to IGD > device assignment regression. > > v2 -> v3: > - fix direct type check > --- > drivers/iommu/iommu.c | 12 +++- > include/linux/iommu.h | 6 ++ > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index ae4ea5c0e6f9..28c3d6351832 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -73,10 +73,11 @@ struct iommu_group_attribute { > }; > > static const char * const iommu_group_resv_type_string[] = { > -[IOMMU_RESV_DIRECT] = "direct", > -[IOMMU_RESV_RESERVED] = "reserved", > -[IOMMU_RESV_MSI]= "msi", > -[IOMMU_RESV_SW_MSI] = "msi", > +[IOMMU_RESV_DIRECT] = "direct", > +[IOMMU_RESV_DIRECT_RELAXABLE] = "direct", > +[IOMMU_RESV_RESERVED] = "reserved", > +[IOMMU_RESV_MSI]= "msi", > +[IOMMU_RESV_SW_MSI] = "msi", > }; > > #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ > @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct > iommu_group *group, > start = ALIGN(entry->start, pg_size); > end = ALIGN(entry->start + entry->length, pg_size); > > -if (entry->type != IOMMU_RESV_DIRECT) > +if (entry->type != IOMMU_RESV_DIRECT && > +entry->type != IOMMU_RESV_DIRECT_RELAXABLE) > >>> > >>> I'm trying to understand why you need to create direct mappings at all > >>> for these relaxable regions. In the host the region is needed for legacy > >>> device features, which are disabled (and cannot be re-enabled) when > >>> assigning the device to a guest? > >> This follows Kevin's comment in the thread below: > >> https://patchwork.kernel.org/patch/10449103/#21957279 > >> > >> In normal DMA API host path, those regions need to be 1-1 mapped. They > >> are likely to be accessed by the driver or FW at early boot phase or > >> even during execution, depending on features being used. > >> > >> That's the reason, according to Kevin we couldn't hide them. > >> > >> We just know that, in general, they are not used anymore when assigning > >> the device or if accesses are attempted this generally does not block > >> the assignment use case. For example, it is said in > >> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in > >> legacy IGD assignment use case, there may be "a small numbers of DMAR > >> faults when initially assigned". > > > > Hmm, fair enough. That doesn't sound too good, if the device might > > perform arbitrary writes into guest memory once new IOMMU mappings are > > in place. I was wondering if we could report some IOVA ranges as > > "available but avoid if possible". > In Shameer's series we currently reject any vfio dma_map that would fall > into an RMRR (hence the regression on existing USB/GFX use case). With > the relaxable RMRR info we could imagine to let the userspace choose > whether we want to proceed with the dma_map despite the risk or > introduce a vfio_iommu_type1 module option (turned off by default for > not regressing existing USB/GFX passthrough) that would forbid dma_map > on relaxable RMRR regions. Yep, the risk that Jean-Philippe mentions is real, the IGD device has the stolen memory addresses latched into the hardware and we're unable to change that. What we try to do now is trap page table writes to the device and translate them to a VM allocated stolen memory range, which is sufficient for getting a BIOS splash screen, but we really want to assume that the OS level driver just doesn't use the stolen memory range. There was a time when it seemed like we could assume the Intel drivers were heading in that
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On Thu, 16 May 2019 15:23:17 +0200 Auger Eric wrote: > Hi Robin, > On 5/16/19 2:46 PM, Robin Murphy wrote: > >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h > >> index ba91666998fb..14a521f85f14 100644 > >> --- a/include/linux/iommu.h > >> +++ b/include/linux/iommu.h > >> @@ -135,6 +135,12 @@ enum iommu_attr { > >> enum iommu_resv_type { > >> /* Memory regions which must be mapped 1:1 at all times */ > >> IOMMU_RESV_DIRECT, > >> + /* > >> + * Memory regions which are advertised to be 1:1 but are > >> + * commonly considered relaxable in some conditions, > >> + * for instance in device assignment use case (USB, Graphics) > >> + */ > >> + IOMMU_RESV_DIRECT_RELAXABLE, > > > > What do you think of s/RELAXABLE/BOOT/ ? My understanding is that these > > regions are only considered relevant until Linux has taken full control > > of the endpoint, and having a slightly more well-defined scope than > > "some conditions" might be nice. > That's not my current understanding. I think those RMRRs may be used > post-boot (especially the IGD stolen memory covered by RMRR). I > understand this depends on the video mode or FW in use by the IGD. But I > am definitively not an expert here. Nor am I, but generally the distinction I'm trying to achieve is whether the reserved region is necessary for the device operation or for the system operation. If we deny the IGD device its mapping to stolen memory, then maybe the IGD device doesn't work, no big deal. If we deny USB its RMRR, then we assume we're only cutting off PS/2 emulation that we expect isn't used at this point anyway. Both of these are choices in how the driver wants to use the device. On the other hand if we have a system where management firmware has backdoors to devices for system health monitoring, then declining to honor the RMRR has larger implications. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
On Thu, 16 May 2019 15:14:40 +0100 Jean-Philippe Brucker wrote: > Hi Jacob, > > On 03/05/2019 23:32, Jacob Pan wrote: > > +/** > > + * struct gpasid_bind_data - Information about device and guest > > PASID binding > > + * @gcr3: Guest CR3 value from guest mm > > + * @pasid: Process address space ID used for the guest mm > > + * @addr_width:Guest address width. Paging mode can also > > be derived. > > + */ > > +struct gpasid_bind_data { > > + __u64 gcr3; > > + __u32 pasid; > > + __u32 addr_width; > > + __u32 flags; > > +#defineIOMMU_SVA_GPASID_SREBIT(0) /* supervisor > > request */ > > + __u8 padding[4]; > > +}; > > Could you wrap this structure into a generic one like we now do for > bind_pasid_table? It would make the API easier to extend, because if > we ever add individual PASID bind on Arm (something I'd like to do for > virtio-iommu, eventually) it will have different parameters, as our > PASID table entry has a lot of fields describing the page table > format. > > Maybe something like the following would do? > > struct gpasid_bind_data { > #define IOMMU_GPASID_BIND_VERSION_1 1 > __u32 version; > #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD1 > __u32 format; > union { > // the current gpasid_bind_data: > struct gpasid_bind_intel_vtd vtd; > }; > }; > OK, sounds great. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] vfio: vfio_iommu_type1: Define VFIO_IOMMU_INFO_CAPABILITIES
Alex, patch 1 and 3 are s390 specific, 2 and 4 are vfio common code. Are you ok with the common code changes? If yes, would you prefer to have this via the s390 tree (Martin) or your tree? On 10.05.19 10:22, Pierre Morel wrote: > To use the VFIO_IOMMU_GET_INFO to retrieve IOMMU specific information, > we define a new flag VFIO_IOMMU_INFO_CAPABILITIES in the > vfio_iommu_type1_info structure and the associated capability > information block. > > Signed-off-by: Pierre Morel > --- > include/uapi/linux/vfio.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 8f10748..8f68e0f 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -715,6 +715,16 @@ struct vfio_iommu_type1_info { > __u32 flags; > #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ > __u64 iova_pgsizes; /* Bitmap of supported page sizes */ > +#define VFIO_IOMMU_INFO_CAPABILITIES (1 << 1) /* support capabilities info > */ > + __u64 cap_offset; /* Offset within info struct of first cap */ > +}; > + > +#define VFIO_IOMMU_INFO_CAP_QFN 1 > +#define VFIO_IOMMU_INFO_CAP_QGRP 2 > + > +struct vfio_iommu_type1_info_block { > + struct vfio_info_cap_header header; > + __u32 data[]; > }; > > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) >
Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function
Hi Jacob, On 03/05/2019 23:32, Jacob Pan wrote: > +/** > + * struct gpasid_bind_data - Information about device and guest PASID binding > + * @gcr3:Guest CR3 value from guest mm > + * @pasid: Process address space ID used for the guest mm > + * @addr_width: Guest address width. Paging mode can also be derived. > + */ > +struct gpasid_bind_data { > + __u64 gcr3; > + __u32 pasid; > + __u32 addr_width; > + __u32 flags; > +#define IOMMU_SVA_GPASID_SREBIT(0) /* supervisor request */ > + __u8 padding[4]; > +}; Could you wrap this structure into a generic one like we now do for bind_pasid_table? It would make the API easier to extend, because if we ever add individual PASID bind on Arm (something I'd like to do for virtio-iommu, eventually) it will have different parameters, as our PASID table entry has a lot of fields describing the page table format. Maybe something like the following would do? struct gpasid_bind_data { #define IOMMU_GPASID_BIND_VERSION_1 1 __u32 version; #define IOMMU_GPASID_BIND_FORMAT_INTEL_VTD 1 __u32 format; union { // the current gpasid_bind_data: struct gpasid_bind_intel_vtd vtd; }; }; Thanks, Jean
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
Dne 16.5.2019 v 15:11 Oliver Neukum napsal(a): You will be asked whether this worked in earlier version of the kernel. The answer would be: yes, no, unknown (why) HTH Oliver Thank you, I sent new report. starosta ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
On Do, 2019-05-16 at 14:29 +0200, starost...@gmail.com wrote: > Dne 16.5.2019 v 10:34 Oliver Neukum napsal(a): > > Mention > > > > 1. kernel version > > 2. whether this is known to be a regression > > Sorry for question, can you more specify what you mean? You will be asked whether this worked in earlier version of the kernel. The answer would be: yes, no, unknown (why) HTH Oliver ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Hi Robin, On 5/16/19 2:46 PM, Robin Murphy wrote: > On 16/05/2019 11:08, Eric Auger wrote: >> Introduce a new type for reserved region. This corresponds >> to directly mapped regions which are known to be relaxable >> in some specific conditions, such as device assignment use >> case. Well known examples are those used by USB controllers >> providing PS/2 keyboard emulation for pre-boot BIOS and >> early BOOT or RMRRs associated to IGD working in legacy mode. >> >> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs >> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow >> RMRR on graphics devices too"), those regions are currently >> considered "safe" with respect to device assignment use case >> which requires a non direct mapping at IOMMU physical level >> (RAM GPA -> HPA mapping). >> >> Those RMRRs currently exist and sometimes the device is >> attempting to access it but this has not been considered >> an issue until now. >> >> However at the moment, iommu_get_group_resv_regions() is >> not able to make any difference between directly mapped >> regions: those which must be absolutely enforced and those >> like above ones which are known as relaxable. >> >> This is a blocker for reporting severe conflicts between >> non relaxable RMRRs (like MSI doorbells) and guest GPA space. >> >> With this new reserved region type we will be able to use >> iommu_get_group_resv_regions() to enumerate the IOVA space >> that is usable through the IOMMU API without introducing >> regressions with respect to existing device assignment >> use cases (USB and IGD). >> >> Signed-off-by: Eric Auger >> >> --- >> >> Note: At the moment the sysfs ABI is not changed. However I wonder >> whether it wouldn't be preferable to report the direct region as >> "direct_relaxed" there. At the moment, in case the same direct >> region is used by 2 devices, one USB/GFX and another not belonging >> to the previous categories, the direct region will be output twice >> with "direct" type. > > Hmm, that sounds a bit off - if we have overlapping regions within the > same domain, then we need to do some additional pre-processing to adjust > them anyway, since any part of a relaxable region which overlaps a > non-relaxable region cannot actually be relaxed, and so really should > never be described as such. In iommu_insert_resv_region(), we are overlapping regions of the same type. So iommu_get_group_resv_regions() should return both the relaxable region and non relaxable one. I should test this again using a hacked kernel though. > >> This would unblock Shameer's series: >> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management >> https://patchwork.kernel.org/patch/10425309/ >> >> which failed to get pulled for 4.18 merge window due to IGD >> device assignment regression. >> >> v2 -> v3: >> - fix direct type check >> --- >> drivers/iommu/iommu.c | 12 +++- >> include/linux/iommu.h | 6 ++ >> 2 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index ae4ea5c0e6f9..28c3d6351832 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -73,10 +73,11 @@ struct iommu_group_attribute { >> }; >> static const char * const iommu_group_resv_type_string[] = { >> - [IOMMU_RESV_DIRECT] = "direct", >> - [IOMMU_RESV_RESERVED] = "reserved", >> - [IOMMU_RESV_MSI] = "msi", >> - [IOMMU_RESV_SW_MSI] = "msi", >> + [IOMMU_RESV_DIRECT] = "direct", >> + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", > > Wasn't part of the idea that we might not need to expose these to > userspace at all if they're only relevant to default domains, and not to > VFIO domains or anything else userspace can get involved with? Fur sure, today, those regions are exposed as "direct". Isn't this information relevant to the userspace as it would rather not use those direct regions. We eventually chose to implement the check inside the VFIO driver but I understood we wanted this info to be visible. Then it is arguable whether we should expose the new relaxable type, at the pain of making the UABI evolve. > >> + [IOMMU_RESV_RESERVED] = "reserved", >> + [IOMMU_RESV_MSI] = "msi", >> + [IOMMU_RESV_SW_MSI] = "msi", >> }; >> #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ >> @@ -573,7 +574,8 @@ static int >> iommu_group_create_direct_mappings(struct iommu_group *group, >> start = ALIGN(entry->start, pg_size); >> end = ALIGN(entry->start + entry->length, pg_size); >> - if (entry->type != IOMMU_RESV_DIRECT) >> + if (entry->type != IOMMU_RESV_DIRECT && >> + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) >> continue; >> for (addr = start; addr < end; addr += pg_size) { >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h >> index ba91666998fb..14a521f85f14 100644 >>
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Hi Jean-Philippe, On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote: > On 16/05/2019 12:45, Auger Eric wrote: >> Hi Jean-Philippe, >> >> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote: >>> On 16/05/2019 11:08, Eric Auger wrote: Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. This would unblock Shameer's series: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ >>> >>> Thanks for doing this! >>> which failed to get pulled for 4.18 merge window due to IGD device assignment regression. v2 -> v3: - fix direct type check --- drivers/iommu/iommu.c | 12 +++- include/linux/iommu.h | 6 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ae4ea5c0e6f9..28c3d6351832 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -73,10 +73,11 @@ struct iommu_group_attribute { }; static const char * const iommu_group_resv_type_string[] = { - [IOMMU_RESV_DIRECT] = "direct", - [IOMMU_RESV_RESERVED] = "reserved", - [IOMMU_RESV_MSI]= "msi", - [IOMMU_RESV_SW_MSI] = "msi", + [IOMMU_RESV_DIRECT] = "direct", + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", + [IOMMU_RESV_RESERVED] = "reserved", + [IOMMU_RESV_MSI]= "msi", + [IOMMU_RESV_SW_MSI] = "msi", }; #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, start = ALIGN(entry->start, pg_size); end = ALIGN(entry->start + entry->length, pg_size); - if (entry->type != IOMMU_RESV_DIRECT) + if (entry->type != IOMMU_RESV_DIRECT && + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) >>> >>> I'm trying to understand why you need to create direct mappings at all >>> for these relaxable regions. In the host the region is needed for legacy >>> device features, which are disabled (and cannot be re-enabled) when >>> assigning the device to a guest? >> This follows Kevin's comment in the thread below: >> https://patchwork.kernel.org/patch/10449103/#21957279 >> >> In normal DMA API host path, those regions need to be 1-1 mapped. They >> are likely to be accessed by the driver or FW at early boot phase or >> even during execution, depending on features being used. >> >> That's the reason, according to Kevin we couldn't hide them. >> >> We just know that, in general, they are not used anymore when assigning >> the device or if accesses are attempted this generally does not block >> the assignment use case. For example, it is said in >> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in >> legacy IGD assignment use case, there may be "a small numbers of DMAR >> faults when initially assigned". > > Hmm, fair enough. That doesn't sound too good, if the device might > perform arbitrary writes into guest memory once new IOMMU mappings are > in place. I was wondering if we could report some IOVA ranges as > "available but avoid if possible". In Shameer's series we currently reject any vfio dma_map that would fall into an RMRR (hence the regression on existing USB/GFX use case). With the relaxable RMRR info we could imagine to let the userspace choose whether we want to proceed with the dma_map despite the risk or introduce a vfio_iommu_type1 module option (turned off by default for not regressing existing USB/GFX passthrough) that would forbid dma_map on relaxable RMRR regions. Thanks Eric If the guest has a vIOMMU, they are > easy to avoid. But I doubt they would ever get used, since probably no > one is going to instantiate a vIOMMU for a graphics device in legacy mode. > > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On 16/05/2019 11:08, Eric Auger wrote: Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger --- Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. Hmm, that sounds a bit off - if we have overlapping regions within the same domain, then we need to do some additional pre-processing to adjust them anyway, since any part of a relaxable region which overlaps a non-relaxable region cannot actually be relaxed, and so really should never be described as such. This would unblock Shameer's series: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ which failed to get pulled for 4.18 merge window due to IGD device assignment regression. v2 -> v3: - fix direct type check --- drivers/iommu/iommu.c | 12 +++- include/linux/iommu.h | 6 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ae4ea5c0e6f9..28c3d6351832 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -73,10 +73,11 @@ struct iommu_group_attribute { }; static const char * const iommu_group_resv_type_string[] = { - [IOMMU_RESV_DIRECT] = "direct", - [IOMMU_RESV_RESERVED] = "reserved", - [IOMMU_RESV_MSI]= "msi", - [IOMMU_RESV_SW_MSI] = "msi", + [IOMMU_RESV_DIRECT] = "direct", + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", Wasn't part of the idea that we might not need to expose these to userspace at all if they're only relevant to default domains, and not to VFIO domains or anything else userspace can get involved with? + [IOMMU_RESV_RESERVED] = "reserved", + [IOMMU_RESV_MSI]= "msi", + [IOMMU_RESV_SW_MSI] = "msi", }; #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, start = ALIGN(entry->start, pg_size); end = ALIGN(entry->start + entry->length, pg_size); - if (entry->type != IOMMU_RESV_DIRECT) + if (entry->type != IOMMU_RESV_DIRECT && + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) continue; for (addr = start; addr < end; addr += pg_size) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ba91666998fb..14a521f85f14 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -135,6 +135,12 @@ enum iommu_attr { enum iommu_resv_type { /* Memory regions which must be mapped 1:1 at all times */ IOMMU_RESV_DIRECT, + /* +* Memory regions which are advertised to be 1:1 but are +* commonly considered relaxable in some conditions, +* for instance in device assignment use case (USB, Graphics) +*/ + IOMMU_RESV_DIRECT_RELAXABLE, What do you think of s/RELAXABLE/BOOT/ ? My understanding is that these regions are only considered relevant until Linux has taken full control of the endpoint, and having a slightly more well-defined scope than "some conditions" might be nice. There's an open question of how to handle things like simple-fb and EFI framebuffer handover on
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On 16/05/2019 12:45, Auger Eric wrote: > Hi Jean-Philippe, > > On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote: >> On 16/05/2019 11:08, Eric Auger wrote: >>> Note: At the moment the sysfs ABI is not changed. However I wonder >>> whether it wouldn't be preferable to report the direct region as >>> "direct_relaxed" there. At the moment, in case the same direct >>> region is used by 2 devices, one USB/GFX and another not belonging >>> to the previous categories, the direct region will be output twice >>> with "direct" type. >>> >>> This would unblock Shameer's series: >>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management >>> https://patchwork.kernel.org/patch/10425309/ >> >> Thanks for doing this! >> >>> which failed to get pulled for 4.18 merge window due to IGD >>> device assignment regression. >>> >>> v2 -> v3: >>> - fix direct type check >>> --- >>> drivers/iommu/iommu.c | 12 +++- >>> include/linux/iommu.h | 6 ++ >>> 2 files changed, 13 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index ae4ea5c0e6f9..28c3d6351832 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -73,10 +73,11 @@ struct iommu_group_attribute { >>> }; >>> >>> static const char * const iommu_group_resv_type_string[] = { >>> - [IOMMU_RESV_DIRECT] = "direct", >>> - [IOMMU_RESV_RESERVED] = "reserved", >>> - [IOMMU_RESV_MSI]= "msi", >>> - [IOMMU_RESV_SW_MSI] = "msi", >>> + [IOMMU_RESV_DIRECT] = "direct", >>> + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", >>> + [IOMMU_RESV_RESERVED] = "reserved", >>> + [IOMMU_RESV_MSI]= "msi", >>> + [IOMMU_RESV_SW_MSI] = "msi", >>> }; >>> >>> #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ >>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct >>> iommu_group *group, >>> start = ALIGN(entry->start, pg_size); >>> end = ALIGN(entry->start + entry->length, pg_size); >>> >>> - if (entry->type != IOMMU_RESV_DIRECT) >>> + if (entry->type != IOMMU_RESV_DIRECT && >>> + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) >> >> I'm trying to understand why you need to create direct mappings at all >> for these relaxable regions. In the host the region is needed for legacy >> device features, which are disabled (and cannot be re-enabled) when >> assigning the device to a guest? > This follows Kevin's comment in the thread below: > https://patchwork.kernel.org/patch/10449103/#21957279 > > In normal DMA API host path, those regions need to be 1-1 mapped. They > are likely to be accessed by the driver or FW at early boot phase or > even during execution, depending on features being used. > > That's the reason, according to Kevin we couldn't hide them. > > We just know that, in general, they are not used anymore when assigning > the device or if accesses are attempted this generally does not block > the assignment use case. For example, it is said in > https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in > legacy IGD assignment use case, there may be "a small numbers of DMAR > faults when initially assigned". Hmm, fair enough. That doesn't sound too good, if the device might perform arbitrary writes into guest memory once new IOMMU mappings are in place. I was wondering if we could report some IOVA ranges as "available but avoid if possible". If the guest has a vIOMMU, they are easy to avoid. But I doubt they would ever get used, since probably no one is going to instantiate a vIOMMU for a graphics device in legacy mode. Thanks, Jean
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
Dne 16.5.2019 v 10:34 Oliver Neukum napsal(a): Mention 1. kernel version 2. whether this is known to be a regression Sorry for question, can you more specify what you mean? 3. include the log with iommu disabled and mention that you disabled the IOMMU 4. Include the output of lsusb HTH Oliver Thank you starosta ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Hi Jean-Philippe, On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote: > On 16/05/2019 11:08, Eric Auger wrote: >> Note: At the moment the sysfs ABI is not changed. However I wonder >> whether it wouldn't be preferable to report the direct region as >> "direct_relaxed" there. At the moment, in case the same direct >> region is used by 2 devices, one USB/GFX and another not belonging >> to the previous categories, the direct region will be output twice >> with "direct" type. >> >> This would unblock Shameer's series: >> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management >> https://patchwork.kernel.org/patch/10425309/ > > Thanks for doing this! > >> which failed to get pulled for 4.18 merge window due to IGD >> device assignment regression. >> >> v2 -> v3: >> - fix direct type check >> --- >> drivers/iommu/iommu.c | 12 +++- >> include/linux/iommu.h | 6 ++ >> 2 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index ae4ea5c0e6f9..28c3d6351832 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -73,10 +73,11 @@ struct iommu_group_attribute { >> }; >> >> static const char * const iommu_group_resv_type_string[] = { >> -[IOMMU_RESV_DIRECT] = "direct", >> -[IOMMU_RESV_RESERVED] = "reserved", >> -[IOMMU_RESV_MSI]= "msi", >> -[IOMMU_RESV_SW_MSI] = "msi", >> +[IOMMU_RESV_DIRECT] = "direct", >> +[IOMMU_RESV_DIRECT_RELAXABLE] = "direct", >> +[IOMMU_RESV_RESERVED] = "reserved", >> +[IOMMU_RESV_MSI]= "msi", >> +[IOMMU_RESV_SW_MSI] = "msi", >> }; >> >> #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ >> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct >> iommu_group *group, >> start = ALIGN(entry->start, pg_size); >> end = ALIGN(entry->start + entry->length, pg_size); >> >> -if (entry->type != IOMMU_RESV_DIRECT) >> +if (entry->type != IOMMU_RESV_DIRECT && >> +entry->type != IOMMU_RESV_DIRECT_RELAXABLE) > > I'm trying to understand why you need to create direct mappings at all > for these relaxable regions. In the host the region is needed for legacy > device features, which are disabled (and cannot be re-enabled) when > assigning the device to a guest? This follows Kevin's comment in the thread below: https://patchwork.kernel.org/patch/10449103/#21957279 In normal DMA API host path, those regions need to be 1-1 mapped. They are likely to be accessed by the driver or FW at early boot phase or even during execution, depending on features being used. That's the reason, according to Kevin we couldn't hide them. We just know that, in general, they are not used anymore when assigning the device or if accesses are attempted this generally does not block the assignment use case. For example, it is said in https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in legacy IGD assignment use case, there may be "a small numbers of DMAR faults when initially assigned". Thanks Eric > > Thanks, > Jean > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
On 16/05/2019 11:08, Eric Auger wrote: > Note: At the moment the sysfs ABI is not changed. However I wonder > whether it wouldn't be preferable to report the direct region as > "direct_relaxed" there. At the moment, in case the same direct > region is used by 2 devices, one USB/GFX and another not belonging > to the previous categories, the direct region will be output twice > with "direct" type. > > This would unblock Shameer's series: > [PATCH v6 0/7] vfio/type1: Add support for valid iova list management > https://patchwork.kernel.org/patch/10425309/ Thanks for doing this! > which failed to get pulled for 4.18 merge window due to IGD > device assignment regression. > > v2 -> v3: > - fix direct type check > --- > drivers/iommu/iommu.c | 12 +++- > include/linux/iommu.h | 6 ++ > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index ae4ea5c0e6f9..28c3d6351832 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -73,10 +73,11 @@ struct iommu_group_attribute { > }; > > static const char * const iommu_group_resv_type_string[] = { > - [IOMMU_RESV_DIRECT] = "direct", > - [IOMMU_RESV_RESERVED] = "reserved", > - [IOMMU_RESV_MSI]= "msi", > - [IOMMU_RESV_SW_MSI] = "msi", > + [IOMMU_RESV_DIRECT] = "direct", > + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", > + [IOMMU_RESV_RESERVED] = "reserved", > + [IOMMU_RESV_MSI]= "msi", > + [IOMMU_RESV_SW_MSI] = "msi", > }; > > #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\ > @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct > iommu_group *group, > start = ALIGN(entry->start, pg_size); > end = ALIGN(entry->start + entry->length, pg_size); > > - if (entry->type != IOMMU_RESV_DIRECT) > + if (entry->type != IOMMU_RESV_DIRECT && > + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) I'm trying to understand why you need to create direct mappings at all for these relaxable regions. In the host the region is needed for legacy device features, which are disabled (and cannot be re-enabled) when assigning the device to a guest? Thanks, Jean
[PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger --- Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. This would unblock Shameer's series: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ which failed to get pulled for 4.18 merge window due to IGD device assignment regression. v2 -> v3: - fix direct type check --- drivers/iommu/iommu.c | 12 +++- include/linux/iommu.h | 6 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ae4ea5c0e6f9..28c3d6351832 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -73,10 +73,11 @@ struct iommu_group_attribute { }; static const char * const iommu_group_resv_type_string[] = { - [IOMMU_RESV_DIRECT] = "direct", - [IOMMU_RESV_RESERVED] = "reserved", - [IOMMU_RESV_MSI]= "msi", - [IOMMU_RESV_SW_MSI] = "msi", + [IOMMU_RESV_DIRECT] = "direct", + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", + [IOMMU_RESV_RESERVED] = "reserved", + [IOMMU_RESV_MSI]= "msi", + [IOMMU_RESV_SW_MSI] = "msi", }; #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, start = ALIGN(entry->start, pg_size); end = ALIGN(entry->start + entry->length, pg_size); - if (entry->type != IOMMU_RESV_DIRECT) + if (entry->type != IOMMU_RESV_DIRECT && + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) continue; for (addr = start; addr < end; addr += pg_size) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ba91666998fb..14a521f85f14 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -135,6 +135,12 @@ enum iommu_attr { enum iommu_resv_type { /* Memory regions which must be mapped 1:1 at all times */ IOMMU_RESV_DIRECT, + /* +* Memory regions which are advertised to be 1:1 but are +* commonly considered relaxable in some conditions, +* for instance in device assignment use case (USB, Graphics) +*/ + IOMMU_RESV_DIRECT_RELAXABLE, /* Arbitrary "never map this or give it to a device" address ranges */ IOMMU_RESV_RESERVED, /* Hardware MSI region (untranslated) */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory region type, let's report USB and GFX RMRRs as relaxable ones. This allows to have a finer reporting at IOMMU API level of reserved memory regions. This will be exploitable by VFIO to define the usable IOVA range and detect potential conflicts between the guest physical address space and host reserved regions. Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a36604f4900f..af1d65fdedfc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct device *device, for_each_rmrr_units(rmrr) { for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, i_dev) { + struct pci_dev *pdev = to_pci_dev(device); struct iommu_resv_region *resv; + enum iommu_resv_type type; size_t length; if (i_dev != device && @@ -5501,9 +5503,13 @@ static void intel_iommu_get_resv_regions(struct device *device, continue; length = rmrr->end_address - rmrr->base_address + 1; + + type = (pdev && + (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))) ? + IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT; + resv = iommu_alloc_resv_region(rmrr->base_address, - length, prot, - IOMMU_RESV_DIRECT, + length, prot, type, GFP_ATOMIC); if (!resv) break; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
We plan to call iommu_alloc_resv_region in a non preemptible section. Pass a GFP flag to the function and update all the call sites. Signed-off-by: Eric Auger --- drivers/acpi/arm64/iort.c | 3 ++- drivers/iommu/amd_iommu.c | 7 --- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/intel-iommu.c | 4 ++-- drivers/iommu/iommu.c | 7 --- include/linux/iommu.h | 2 +- 7 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 9058cb084b91..8313b2d1b710 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -868,7 +868,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) struct iommu_resv_region *region; region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K, -prot, IOMMU_RESV_MSI); +prot, IOMMU_RESV_MSI, +GFP_KERNEL); if (region) { list_add_tail(>list, head); resv++; diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 09c9e45f7fa2..f2eb8e9cd8a6 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3136,7 +3136,8 @@ static void amd_iommu_get_resv_regions(struct device *dev, type = IOMMU_RESV_RESERVED; region = iommu_alloc_resv_region(entry->address_start, -length, prot, type); +length, prot, type, +GFP_KERNEL); if (!region) { dev_err(dev, "Out of memory allocating dm-regions\n"); return; @@ -3146,14 +3147,14 @@ static void amd_iommu_get_resv_regions(struct device *dev, region = iommu_alloc_resv_region(MSI_RANGE_START, MSI_RANGE_END - MSI_RANGE_START + 1, -0, IOMMU_RESV_MSI); +0, IOMMU_RESV_MSI, GFP_KERNEL); if (!region) return; list_add_tail(>list, head); region = iommu_alloc_resv_region(HT_RANGE_START, HT_RANGE_END - HT_RANGE_START + 1, -0, IOMMU_RESV_RESERVED); +0, IOMMU_RESV_RESERVED, GFP_KERNEL); if (!region) return; list_add_tail(>list, head); diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d5a694f02c2..f9b1279ef5bf 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2226,7 +2226,7 @@ static void arm_smmu_get_resv_regions(struct device *dev, int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, -prot, IOMMU_RESV_SW_MSI); +prot, IOMMU_RESV_SW_MSI, GFP_KERNEL); if (!region) return; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5e54cc0a28b3..646e76813e91 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1670,7 +1670,7 @@ static void arm_smmu_get_resv_regions(struct device *dev, int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, -prot, IOMMU_RESV_SW_MSI); +prot, IOMMU_RESV_SW_MSI, GFP_KERNEL); if (!region) return; diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a209199f3af6..2be36dff189a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4220,7 +4220,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) length = rmrr->end_address - rmrr->base_address + 1; rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot, - IOMMU_RESV_DIRECT); + IOMMU_RESV_DIRECT, GFP_KERNEL); if (!rmrru->resv) goto free_rmrru; @@ -5489,7 +5489,7 @@ static void intel_iommu_get_resv_regions(struct device *device, reg = iommu_alloc_resv_region(IOAPIC_RANGE_START, IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1, - 0, IOMMU_RESV_MSI); + 0, IOMMU_RESV_MSI, GFP_KERNEL); if (!reg) return; list_add_tail(>list,
[PATCH v3 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions
In the case the RMRR device scope is a PCI-PCI bridge, let's check the device belongs to the PCI sub-hierarchy. Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 7ed820e79313..a36604f4900f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device *device, struct iommu_resv_region *resv; size_t length; - if (i_dev != device) + if (i_dev != device && + !is_downstream_to_pci_bridge(device, i_dev)) continue; length = rmrr->end_address - rmrr->base_address + 1; -- 2.20.1
[PATCH v3 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes
When reading the vtd specification and especially the Reserved Memory Region Reporting Structure chapter, it is not obvious a device scope element cannot be a PCI-PCI bridge, in which case all downstream ports are likely to access the reserved memory region. Let's handle this case in device_has_rmrr. Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain") Signed-off-by: Eric Auger --- v1 -> v2: - is_downstream_to_pci_bridge helper introduced in a separate patch --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 15c2f9677491..7ed820e79313 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev) */ for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, tmp) - if (tmp == dev) { + if (tmp == dev || + is_downstream_to_pci_bridge(dev, tmp)) { rcu_read_unlock(); return true; } -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list
intel_iommu_get_resv_regions() aims to return the list of reserved regions accessible by a given @device. However several devices can access the same reserved memory region and when building the list it is not safe to use a single iommu_resv_region object, whose container is the RMRR. This iommu_resv_region must be duplicated per device reserved region list. Let's remove the struct iommu_resv_region from the RMRR unit and allocate the iommu_resv_region directly in intel_iommu_get_resv_regions(). Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2be36dff189a..590a0e78d11d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -322,7 +322,6 @@ struct dmar_rmrr_unit { u64 end_address;/* reserved end address */ struct dmar_dev_scope *devices; /* target devices */ int devices_cnt;/* target device count */ - struct iommu_resv_region *resv; /* reserved region handle */ }; struct dmar_atsr_unit { @@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {} int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_reserved_memory *rmrr; - int prot = DMA_PTE_READ|DMA_PTE_WRITE; struct dmar_rmrr_unit *rmrru; size_t length; @@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) rmrru->end_address = rmrr->end_address; length = rmrr->end_address - rmrr->base_address + 1; - rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot, - IOMMU_RESV_DIRECT, GFP_KERNEL); - if (!rmrru->resv) - goto free_rmrru; rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1), ((void *)rmrr) + rmrr->header.length, >devices_cnt); if (rmrru->devices_cnt && rmrru->devices == NULL) - goto free_all; + goto free_rmrru; list_add(>list, _rmrr_units); return 0; -free_all: - kfree(rmrru->resv); free_rmrru: kfree(rmrru); out: @@ -4452,7 +,6 @@ static void intel_iommu_free_dmars(void) list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) { list_del(>list); dmar_free_dev_scope(>devices, >devices_cnt); - kfree(rmrru->resv); kfree(rmrru); } @@ -5470,6 +5461,7 @@ static void intel_iommu_remove_device(struct device *dev) static void intel_iommu_get_resv_regions(struct device *device, struct list_head *head) { + int prot = DMA_PTE_READ|DMA_PTE_WRITE; struct iommu_resv_region *reg; struct dmar_rmrr_unit *rmrr; struct device *i_dev; @@ -5479,10 +5471,21 @@ static void intel_iommu_get_resv_regions(struct device *device, for_each_rmrr_units(rmrr) { for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, i_dev) { + struct iommu_resv_region *resv; + size_t length; + if (i_dev != device) continue; - list_add_tail(>resv->list, head); + length = rmrr->end_address - rmrr->base_address + 1; + resv = iommu_alloc_resv_region(rmrr->base_address, + length, prot, + IOMMU_RESV_DIRECT, + GFP_ATOMIC); + if (!resv) + break; + + list_add_tail(>list, head); } } rcu_read_unlock(); @@ -5500,10 +5503,8 @@ static void intel_iommu_put_resv_regions(struct device *dev, { struct iommu_resv_region *entry, *next; - list_for_each_entry_safe(entry, next, head, list) { - if (entry->type == IOMMU_RESV_MSI) - kfree(entry); - } + list_for_each_entry_safe(entry, next, head, list) + kfree(entry); } int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
Several call sites are about to check whether a device belongs to the PCI sub-hierarchy of a candidate PCI-PCI bridge. Introduce an helper to perform that check. Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 37 + 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 590a0e78d11d..15c2f9677491 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev) return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO; } +/* is_downstream_to_pci_bridge - test if a device belongs to the + * PCI sub-hierarchy of a candidate PCI-PCI bridge + * + * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy + * @bridge: the candidate PCI-PCI bridge + * + * Return: true if @dev belongs to @bridge PCI sub-hierarchy + */ +static bool +is_downstream_to_pci_bridge(struct device *dev, struct device *bridge) +{ + struct pci_dev *pdev, *pbridge; + + if (!dev_is_pci(dev) || !dev_is_pci(bridge)) + return false; + + pdev = to_pci_dev(dev); + pbridge = to_pci_dev(bridge); + + if (pbridge->subordinate && + pbridge->subordinate->number <= pdev->bus->number && + pbridge->subordinate->busn_res.end >= pdev->bus->number) + return true; + + return false; +} + static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; struct intel_iommu *iommu; struct device *tmp; - struct pci_dev *ptmp, *pdev = NULL; + struct pci_dev *pdev = NULL; u16 segment = 0; int i; @@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf goto out; } - if (!pdev || !dev_is_pci(tmp)) - continue; - - ptmp = to_pci_dev(tmp); - if (ptmp->subordinate && - ptmp->subordinate->number <= pdev->bus->number && - ptmp->subordinate->busn_res.end >= pdev->bus->number) + if (is_downstream_to_pci_bridge(dev, tmp)) goto got_pdev; } -- 2.20.1
[PATCH v3 0/7] RMRR related fixes and enhancements
Currently the Intel reserved region is attached to the RMRR unit and when building the list of RMRR seen by a device we link this unique reserved region without taking care of potential multiple usage of this reserved region by several devices. Also while reading the vtd spec it is unclear to me whether the RMRR device scope referenced by an RMRR ACPI struct could be a PCI-PCI bridge, in which case I think we also need to check the device belongs to the PCI sub-hierarchy of the device referenced in the scope. This would be true for device_has_rmrr() and intel_iommu_get_resv_regions(). Last, the VFIO subsystem would need to compute the usable IOVA range by querying the iommu_get_group_resv_regions() API. This would allow, for instance, to report potential conflicts between the guest physical address space and host reserved regions. However iommu_get_group_resv_regions() currently fails to differentiate RMRRs that are known safe for device assignment and RMRRs that must be enforced. So we introduce a new reserved memory region type (relaxable), reported when associated to an USB or GFX device. The last 2 patches aim at unblocking [1] which is stuck since 4.18. [1-5] are fixes [6-7] are enhancements The two parts can be considered separately if needed. References: [1] [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ Branch: This series is available at: https://github.com/eauger/linux/tree/v5.1-rmrr-v3 History: v2 -> v3: s/||/&& in iommu_group_create_direct_mappings v1 -> v2: - introduce is_downstream_to_pci_bridge() in a separate patch, change param names and add kerneldoc comment - add 6,7 Eric Auger (7): iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() iommu/vt-d: Duplicate iommu_resv_region objects per device list iommu/vt-d: Introduce is_downstream_to_pci_bridge helper iommu/vt-d: Handle RMRR with PCI bridge device scopes iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions iommu/vt-d: Differentiate relaxable and non relaxable RMRRs drivers/acpi/arm64/iort.c | 3 +- drivers/iommu/amd_iommu.c | 7 ++-- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/intel-iommu.c | 82 + drivers/iommu/iommu.c | 19 + include/linux/iommu.h | 8 +++- 7 files changed, 82 insertions(+), 41 deletions(-) -- 2.20.1
Re: [PATCH v2 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Hi On 5/16/19 10:47 AM, Eric Auger wrote: > Introduce a new type for reserved region. This corresponds > to directly mapped regions which are known to be relaxable > in some specific conditions, such as device assignment use > case. Well known examples are those used by USB controllers > providing PS/2 keyboard emulation for pre-boot BIOS and > early BOOT or RMRRs associated to IGD working in legacy mode. > > Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs > from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow > RMRR on graphics devices too"), those regions are currently > considered "safe" with respect to device assignment use case > which requires a non direct mapping at IOMMU physical level > (RAM GPA -> HPA mapping). > > Those RMRRs currently exist and sometimes the device is > attempting to access it but this has not been considered > an issue until now. > > However at the moment, iommu_get_group_resv_regions() is > not able to make any difference between directly mapped > regions: those which must be absolutely enforced and those > like above ones which are known as relaxable. > > This is a blocker for reporting severe conflicts between > non relaxable RMRRs (like MSI doorbells) and guest GPA space. > > With this new reserved region type we will be able to use > iommu_get_group_resv_regions() to enumerate the IOVA space > that is usable through the IOMMU API without introducing > regressions with respect to existing device assignment > use cases (USB and IGD). > > Signed-off-by: Eric Auger > > --- > > Note: At the moment the sysfs ABI is not changed. However I wonder > whether it wouldn't be preferable to report the direct region as > "direct_relaxed" there. At the moment, in case the same direct > region is used by 2 devices, one USB/GFX and another not belonging > to the previous categories, the direct region will be output twice > with "direct" type. > > This would unblock Shameer's series: > [PATCH v6 0/7] vfio/type1: Add support for valid iova list management > https://patchwork.kernel.org/patch/10425309/ > > which failed to get pulled for 4.18 merge window due to IGD > device assignment regression. > --- > drivers/iommu/iommu.c | 12 +++- > include/linux/iommu.h | 6 ++ > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index ae4ea5c0e6f9..84dcb6af6511 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -73,10 +73,11 @@ struct iommu_group_attribute { > }; > > static const char * const iommu_group_resv_type_string[] = { > - [IOMMU_RESV_DIRECT] = "direct", > - [IOMMU_RESV_RESERVED] = "reserved", > - [IOMMU_RESV_MSI]= "msi", > - [IOMMU_RESV_SW_MSI] = "msi", > + [IOMMU_RESV_DIRECT] = "direct", > + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", > + [IOMMU_RESV_RESERVED] = "reserved", > + [IOMMU_RESV_MSI]= "msi", > + [IOMMU_RESV_SW_MSI] = "msi", > }; > > #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\ > @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct > iommu_group *group, > start = ALIGN(entry->start, pg_size); > end = ALIGN(entry->start + entry->length, pg_size); > > - if (entry->type != IOMMU_RESV_DIRECT) > + if (entry->type != IOMMU_RESV_DIRECT || It must be a "&&" and not "||" Respinning ... Eric > + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) > continue; > > for (addr = start; addr < end; addr += pg_size) { > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index ba91666998fb..14a521f85f14 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -135,6 +135,12 @@ enum iommu_attr { > enum iommu_resv_type { > /* Memory regions which must be mapped 1:1 at all times */ > IOMMU_RESV_DIRECT, > + /* > + * Memory regions which are advertised to be 1:1 but are > + * commonly considered relaxable in some conditions, > + * for instance in device assignment use case (USB, Graphics) > + */ > + IOMMU_RESV_DIRECT_RELAXABLE, > /* Arbitrary "never map this or give it to a device" address ranges */ > IOMMU_RESV_RESERVED, > /* Hardware MSI region (untranslated) */ >
[PATCH v5 1/1] iommu/io-pgtable-arm: Add support to use system cache
Few Qualcomm platforms such as, sdm845 have an additional outer cache called as System cache, aka. Last level cache (LLC) that allows non-coherent devices to upgrade to using caching. This cache sits right before the DDR, and is tightly coupled with the memory controller. The clients using this cache request their slices from this system cache, make it active, and can then start using it. There is a fundamental assumption that non-coherent devices can't access caches. This change adds an exception where they *can* use some level of cache despite still being non-coherent overall. The coherent devices that use cacheable memory, and CPU make use of this system cache by default. Looking at memory types, we have following - a) Normal uncached :- MAIR 0x44, inner non-cacheable, outer non-cacheable; b) Normal cached :- MAIR 0xff, inner read write-back non-transient, outer read write-back non-transient; attribute setting for coherenet I/O devices. and, for non-coherent i/o devices that can allocate in system cache another type gets added - c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable, outer read write-back non-transient Coherent I/O devices use system cache by marking the memory as normal cached. Non-coherent I/O devices should mark the memory as normal sys-cached in page tables to use system cache. Signed-off-by: Vivek Gautam --- V3 version of this patch and related series can be found at [1]. V4 of this patch is available at [2]. The example usage of how a smmu master can make use of this protection flag and set the correct memory attributes to start using system cache, can be found at [3]; and here at [3] IOMMU_UPSTREAM_HINT is same as IOMMU_QCOM_SYS_CACHE. Changes since v4: - Changed ARM_LPAE_MAIR_ATTR_QCOM_SYS_CACHE to ARM_LPAE_MAIR_ATTR_INC_OWBRWA. - Changed ARM_LPAE_MAIR_ATTR_IDX_QCOM_SYS_CACHE to ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE. - Added comments to iommu protection flag - IOMMU_QCOM_SYS_CACHE. Changes since v3: - Dropping support to cache i/o page tables to system cache. Getting support for data buffers is the first step. Removed io-pgtable quirk and related change to add domain attribute. Glmark2 numbers on SDM845 based cheza board: S.No.| with LLC support |without LLC support | for data buffers | --- 1| 4480; 72.3fps |4042; 65.2fps 2| 4500; 72.6fps |4039; 65.1fps 3| 4523; 72.9fps |4106; 66.2fps 4| 4489; 72.4fps |4104; 66.2fps 5| 4518; 72.9fps |4072; 65.7fps [1] https://patchwork.kernel.org/cover/10772629/ [2] https://lore.kernel.org/patchwork/patch/1072936/ [3] https://patchwork.kernel.org/patch/10302791/ drivers/iommu/io-pgtable-arm.c | 9 - include/linux/iommu.h | 6 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 4e21efbc4459..2454ac11aa97 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -167,10 +167,12 @@ #define ARM_LPAE_MAIR_ATTR_MASK0xff #define ARM_LPAE_MAIR_ATTR_DEVICE 0x04 #define ARM_LPAE_MAIR_ATTR_NC 0x44 +#define ARM_LPAE_MAIR_ATTR_INC_OWBRWA 0xf4 #define ARM_LPAE_MAIR_ATTR_WBRWA 0xff #define ARM_LPAE_MAIR_ATTR_IDX_NC 0 #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 +#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE 3 #define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0) #define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2) @@ -470,6 +472,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, else if (prot & IOMMU_CACHE) pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (prot & IOMMU_QCOM_SYS_CACHE) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } if (prot & IOMMU_NOEXEC) @@ -857,7 +862,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) (ARM_LPAE_MAIR_ATTR_WBRWA << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) | (ARM_LPAE_MAIR_ATTR_DEVICE - << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) | + (ARM_LPAE_MAIR_ATTR_INC_OWBRWA + << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE)); cfg->arm_lpae_s1_cfg.mair[0] = reg; cfg->arm_lpae_s1_cfg.mair[1] = 0; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a815cf6f6f47..8ee3fbaf5855 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -41,6 +41,12 @@ * if the IOMMU page
[PATCH v2 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory region type, let's report USB and GFX RMRRs as relaxable ones. This allows to have a finer reporting at IOMMU API level of reserved memory regions. This will be exploitable by VFIO to define the usable IOVA range and detect potential conflicts between the guest physical address space and host reserved regions. Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a36604f4900f..af1d65fdedfc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct device *device, for_each_rmrr_units(rmrr) { for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, i_dev) { + struct pci_dev *pdev = to_pci_dev(device); struct iommu_resv_region *resv; + enum iommu_resv_type type; size_t length; if (i_dev != device && @@ -5501,9 +5503,13 @@ static void intel_iommu_get_resv_regions(struct device *device, continue; length = rmrr->end_address - rmrr->base_address + 1; + + type = (pdev && + (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))) ? + IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT; + resv = iommu_alloc_resv_region(rmrr->base_address, - length, prot, - IOMMU_RESV_DIRECT, + length, prot, type, GFP_ATOMIC); if (!resv) break; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger --- Note: At the moment the sysfs ABI is not changed. However I wonder whether it wouldn't be preferable to report the direct region as "direct_relaxed" there. At the moment, in case the same direct region is used by 2 devices, one USB/GFX and another not belonging to the previous categories, the direct region will be output twice with "direct" type. This would unblock Shameer's series: [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ which failed to get pulled for 4.18 merge window due to IGD device assignment regression. --- drivers/iommu/iommu.c | 12 +++- include/linux/iommu.h | 6 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ae4ea5c0e6f9..84dcb6af6511 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -73,10 +73,11 @@ struct iommu_group_attribute { }; static const char * const iommu_group_resv_type_string[] = { - [IOMMU_RESV_DIRECT] = "direct", - [IOMMU_RESV_RESERVED] = "reserved", - [IOMMU_RESV_MSI]= "msi", - [IOMMU_RESV_SW_MSI] = "msi", + [IOMMU_RESV_DIRECT] = "direct", + [IOMMU_RESV_DIRECT_RELAXABLE] = "direct", + [IOMMU_RESV_RESERVED] = "reserved", + [IOMMU_RESV_MSI]= "msi", + [IOMMU_RESV_SW_MSI] = "msi", }; #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, start = ALIGN(entry->start, pg_size); end = ALIGN(entry->start + entry->length, pg_size); - if (entry->type != IOMMU_RESV_DIRECT) + if (entry->type != IOMMU_RESV_DIRECT || + entry->type != IOMMU_RESV_DIRECT_RELAXABLE) continue; for (addr = start; addr < end; addr += pg_size) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ba91666998fb..14a521f85f14 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -135,6 +135,12 @@ enum iommu_attr { enum iommu_resv_type { /* Memory regions which must be mapped 1:1 at all times */ IOMMU_RESV_DIRECT, + /* +* Memory regions which are advertised to be 1:1 but are +* commonly considered relaxable in some conditions, +* for instance in device assignment use case (USB, Graphics) +*/ + IOMMU_RESV_DIRECT_RELAXABLE, /* Arbitrary "never map this or give it to a device" address ranges */ IOMMU_RESV_RESERVED, /* Hardware MSI region (untranslated) */ -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions
In the case the RMRR device scope is a PCI-PCI bridge, let's check the device belongs to the PCI sub-hierarchy. Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 7ed820e79313..a36604f4900f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device *device, struct iommu_resv_region *resv; size_t length; - if (i_dev != device) + if (i_dev != device && + !is_downstream_to_pci_bridge(device, i_dev)) continue; length = rmrr->end_address - rmrr->base_address + 1; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
On Do, 2019-05-16 at 10:20 +0200, starost...@gmail.com wrote: > Dne 16.5.2019 v 9:58 Oliver Neukum napsal(a): > > > > 2.Send a new report to the corresponding mailing list > > > > > > Which mailing list is correct? > > > > In that case linux-...@vger.kernel.org > > > > HTH > > Oliver > > > > Is there some rules how to send bug report? Or I can send report with > "my amateur description"? Mention 1. kernel version 2. whether this is known to be a regression 3. include the log with iommu disabled and mention that you disabled the IOMMU 4. Include the output of lsusb HTH Oliver ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes
When reading the vtd specification and especially the Reserved Memory Region Reporting Structure chapter, it is not obvious a device scope element cannot be a PCI-PCI bridge, in which case all downstream ports are likely to access the reserved memory region. Let's handle this case in device_has_rmrr. Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain") Signed-off-by: Eric Auger --- v1 -> v2: - is_downstream_to_pci_bridge helper introduced in a separate patch --- drivers/iommu/intel-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 15c2f9677491..7ed820e79313 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev) */ for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, tmp) - if (tmp == dev) { + if (tmp == dev || + is_downstream_to_pci_bridge(dev, tmp)) { rcu_read_unlock(); return true; } -- 2.20.1
[PATCH v2 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
We plan to call iommu_alloc_resv_region in a non preemptible section. Pass a GFP flag to the function and update all the call sites. Signed-off-by: Eric Auger --- drivers/acpi/arm64/iort.c | 3 ++- drivers/iommu/amd_iommu.c | 7 --- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/intel-iommu.c | 4 ++-- drivers/iommu/iommu.c | 7 --- include/linux/iommu.h | 2 +- 7 files changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 9058cb084b91..8313b2d1b710 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -868,7 +868,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) struct iommu_resv_region *region; region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K, -prot, IOMMU_RESV_MSI); +prot, IOMMU_RESV_MSI, +GFP_KERNEL); if (region) { list_add_tail(>list, head); resv++; diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 09c9e45f7fa2..f2eb8e9cd8a6 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3136,7 +3136,8 @@ static void amd_iommu_get_resv_regions(struct device *dev, type = IOMMU_RESV_RESERVED; region = iommu_alloc_resv_region(entry->address_start, -length, prot, type); +length, prot, type, +GFP_KERNEL); if (!region) { dev_err(dev, "Out of memory allocating dm-regions\n"); return; @@ -3146,14 +3147,14 @@ static void amd_iommu_get_resv_regions(struct device *dev, region = iommu_alloc_resv_region(MSI_RANGE_START, MSI_RANGE_END - MSI_RANGE_START + 1, -0, IOMMU_RESV_MSI); +0, IOMMU_RESV_MSI, GFP_KERNEL); if (!region) return; list_add_tail(>list, head); region = iommu_alloc_resv_region(HT_RANGE_START, HT_RANGE_END - HT_RANGE_START + 1, -0, IOMMU_RESV_RESERVED); +0, IOMMU_RESV_RESERVED, GFP_KERNEL); if (!region) return; list_add_tail(>list, head); diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d5a694f02c2..f9b1279ef5bf 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2226,7 +2226,7 @@ static void arm_smmu_get_resv_regions(struct device *dev, int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, -prot, IOMMU_RESV_SW_MSI); +prot, IOMMU_RESV_SW_MSI, GFP_KERNEL); if (!region) return; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 5e54cc0a28b3..646e76813e91 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1670,7 +1670,7 @@ static void arm_smmu_get_resv_regions(struct device *dev, int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, -prot, IOMMU_RESV_SW_MSI); +prot, IOMMU_RESV_SW_MSI, GFP_KERNEL); if (!region) return; diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a209199f3af6..2be36dff189a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4220,7 +4220,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) length = rmrr->end_address - rmrr->base_address + 1; rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot, - IOMMU_RESV_DIRECT); + IOMMU_RESV_DIRECT, GFP_KERNEL); if (!rmrru->resv) goto free_rmrru; @@ -5489,7 +5489,7 @@ static void intel_iommu_get_resv_regions(struct device *device, reg = iommu_alloc_resv_region(IOAPIC_RANGE_START, IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1, - 0, IOMMU_RESV_MSI); + 0, IOMMU_RESV_MSI, GFP_KERNEL); if (!reg) return; list_add_tail(>list,
[PATCH v2 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
Several call sites are about to check whether a device belongs to the PCI sub-hierarchy of a candidate PCI-PCI bridge. Introduce an helper to perform that check. Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 37 + 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 590a0e78d11d..15c2f9677491 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev) return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO; } +/* is_downstream_to_pci_bridge - test if a device belongs to the + * PCI sub-hierarchy of a candidate PCI-PCI bridge + * + * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy + * @bridge: the candidate PCI-PCI bridge + * + * Return: true if @dev belongs to @bridge PCI sub-hierarchy + */ +static bool +is_downstream_to_pci_bridge(struct device *dev, struct device *bridge) +{ + struct pci_dev *pdev, *pbridge; + + if (!dev_is_pci(dev) || !dev_is_pci(bridge)) + return false; + + pdev = to_pci_dev(dev); + pbridge = to_pci_dev(bridge); + + if (pbridge->subordinate && + pbridge->subordinate->number <= pdev->bus->number && + pbridge->subordinate->busn_res.end >= pdev->bus->number) + return true; + + return false; +} + static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; struct intel_iommu *iommu; struct device *tmp; - struct pci_dev *ptmp, *pdev = NULL; + struct pci_dev *pdev = NULL; u16 segment = 0; int i; @@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf goto out; } - if (!pdev || !dev_is_pci(tmp)) - continue; - - ptmp = to_pci_dev(tmp); - if (ptmp->subordinate && - ptmp->subordinate->number <= pdev->bus->number && - ptmp->subordinate->busn_res.end >= pdev->bus->number) + if (is_downstream_to_pci_bridge(dev, tmp)) goto got_pdev; } -- 2.20.1
[PATCH v2 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list
intel_iommu_get_resv_regions() aims to return the list of reserved regions accessible by a given @device. However several devices can access the same reserved memory region and when building the list it is not safe to use a single iommu_resv_region object, whose container is the RMRR. This iommu_resv_region must be duplicated per device reserved region list. Let's remove the struct iommu_resv_region from the RMRR unit and allocate the iommu_resv_region directly in intel_iommu_get_resv_regions(). Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger --- drivers/iommu/intel-iommu.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2be36dff189a..590a0e78d11d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -322,7 +322,6 @@ struct dmar_rmrr_unit { u64 end_address;/* reserved end address */ struct dmar_dev_scope *devices; /* target devices */ int devices_cnt;/* target device count */ - struct iommu_resv_region *resv; /* reserved region handle */ }; struct dmar_atsr_unit { @@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {} int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) { struct acpi_dmar_reserved_memory *rmrr; - int prot = DMA_PTE_READ|DMA_PTE_WRITE; struct dmar_rmrr_unit *rmrru; size_t length; @@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg) rmrru->end_address = rmrr->end_address; length = rmrr->end_address - rmrr->base_address + 1; - rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot, - IOMMU_RESV_DIRECT, GFP_KERNEL); - if (!rmrru->resv) - goto free_rmrru; rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1), ((void *)rmrr) + rmrr->header.length, >devices_cnt); if (rmrru->devices_cnt && rmrru->devices == NULL) - goto free_all; + goto free_rmrru; list_add(>list, _rmrr_units); return 0; -free_all: - kfree(rmrru->resv); free_rmrru: kfree(rmrru); out: @@ -4452,7 +,6 @@ static void intel_iommu_free_dmars(void) list_for_each_entry_safe(rmrru, rmrr_n, _rmrr_units, list) { list_del(>list); dmar_free_dev_scope(>devices, >devices_cnt); - kfree(rmrru->resv); kfree(rmrru); } @@ -5470,6 +5461,7 @@ static void intel_iommu_remove_device(struct device *dev) static void intel_iommu_get_resv_regions(struct device *device, struct list_head *head) { + int prot = DMA_PTE_READ|DMA_PTE_WRITE; struct iommu_resv_region *reg; struct dmar_rmrr_unit *rmrr; struct device *i_dev; @@ -5479,10 +5471,21 @@ static void intel_iommu_get_resv_regions(struct device *device, for_each_rmrr_units(rmrr) { for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, i, i_dev) { + struct iommu_resv_region *resv; + size_t length; + if (i_dev != device) continue; - list_add_tail(>resv->list, head); + length = rmrr->end_address - rmrr->base_address + 1; + resv = iommu_alloc_resv_region(rmrr->base_address, + length, prot, + IOMMU_RESV_DIRECT, + GFP_ATOMIC); + if (!resv) + break; + + list_add_tail(>list, head); } } rcu_read_unlock(); @@ -5500,10 +5503,8 @@ static void intel_iommu_put_resv_regions(struct device *dev, { struct iommu_resv_region *entry, *next; - list_for_each_entry_safe(entry, next, head, list) { - if (entry->type == IOMMU_RESV_MSI) - kfree(entry); - } + list_for_each_entry_safe(entry, next, head, list) + kfree(entry); } int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/7] RMRR related fixes and enhancements
Currently the Intel reserved region is attached to the RMRR unit and when building the list of RMRR seen by a device we link this unique reserved region without taking care of potential multiple usage of this reserved region by several devices. Also while reading the vtd spec it is unclear to me whether the RMRR device scope referenced by an RMRR ACPI struct could be a PCI-PCI bridge, in which case I think we also need to check the device belongs to the PCI sub-hierarchy of the device referenced in the scope. This would be true for device_has_rmrr() and intel_iommu_get_resv_regions(). Last, the VFIO subsystem would need to compute the usable IOVA range by querying the iommu_get_group_resv_regions() API. This would allow, for instance, to report potential conflicts between the guest physical address space and host reserved regions. However iommu_get_group_resv_regions() currently fails to differentiate RMRRs that are known safe for device assignment and RMRRs that must be enforced. So we introduce a new reserved memory region type (relaxable), reported when associated to an USB or GFX device. The last 2 patches aim at unblocking [1] which is stuck since 4.18. [1-5] are fixes [6-7] are enhancements References: [1] [PATCH v6 0/7] vfio/type1: Add support for valid iova list management https://patchwork.kernel.org/patch/10425309/ The two parts can be considered separately if needed. History 1->2: - introduce is_downstream_to_pci_bridge() in a separate patch, change param names and add kerneldoc comment (Jacob) - add 6,7 Eric Auger (7): iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() iommu/vt-d: Duplicate iommu_resv_region objects per device list iommu/vt-d: Introduce is_downstream_to_pci_bridge helper iommu/vt-d: Handle RMRR with PCI bridge device scopes iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions iommu/vt-d: Differentiate relaxable and non relaxable RMRRs drivers/acpi/arm64/iort.c | 3 +- drivers/iommu/amd_iommu.c | 7 ++-- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c| 2 +- drivers/iommu/intel-iommu.c | 82 + drivers/iommu/iommu.c | 19 + include/linux/iommu.h | 8 +++- 7 files changed, 82 insertions(+), 41 deletions(-) -- 2.20.1
Re: Linux crash when using FTDI FT232R USB to serial converter on AMD boards.
Dne 16.5.2019 v 9:58 Oliver Neukum napsal(a): 2.Send a new report to the corresponding mailing list Which mailing list is correct? In that case linux-...@vger.kernel.org HTH Oliver Is there some rules how to send bug report? Or I can send report with "my amateur description"? starosta ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: io-pgtable: Support non-coherent page tables
On Thu, May 16, 2019 at 5:03 AM Bjorn Andersson wrote: > > Describe the memory related to page table walks as non-cachable for iommu > instances that are not DMA coherent. > > Signed-off-by: Bjorn Andersson > --- > drivers/iommu/io-pgtable-arm.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 4e21efbc4459..68ff22ffd2cb 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -803,9 +803,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, > void *cookie) > return NULL; > > /* TCR */ > - reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | > - (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); > + if (cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA) { > + reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | > + (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) | > + (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT); > + } else { > + reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) | > + (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) | > + (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT); > + } This looks okay to me based on the discussion that we had on a similar patch that I posted. So, Reviewed-by: Vivek Gautam [1] https://lore.kernel.org/patchwork/patch/1032939/ Thanks & regards Vivek > > switch (ARM_LPAE_GRANULE(data)) { > case SZ_4K: > -- > 2.18.0 > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu