Re: Support SVM without PASID
On 2017/8/7 20:52, Jean-Philippe Brucker wrote: > Hi Bob, > > On 07/08/17 13:18, Bob Liu wrote: >> On 2017/8/7 18:31, Jean-Philippe Brucker wrote: >>> On 05/08/17 06:14, valmiki wrote: >>> [...] Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map and pass them to VFIO. But if we use dynamic allocation in application (say malloc), do we need to use dma API to get iova and then call VFIO_IOMMU_MAP ioctl ? If application needs multiple such dynamic allocations, then it need to allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then manage rest allocations requirements from this buffer ? >>> >>> Yes, without SVM, the application allocates large buffers, allocates IOVAs >>> itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the >>> DMA API at all, it manages IOVAs as it wants. Sizes passed to >>> VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity >>> (that is at least 4kB), and both iova and vaddr have to be aligned on that >>> granularity as well. So malloc isn't really suitable in this case, you'll >>> need mmap. The application can then implement a small allocator to manage >>> the DMA pool created with VFIO_IOMMU_MAP. >>> >>> With SVM the application binds its address space to the device, and then >>> uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP. >>> >> >> Hi Jean, >> >> I think there is another way to support SVM without PASID. >> >> Suppose there is a device in the same SOC-chip, the device access memory >> through SMMU(using internal bus instead of PCIe) >> Once page fault, the device send an event with (vaddr, substreamID) to SMMU, >> then SMMU triggers an event interrupt. >> >> In the event interrupt handler, we can implement the same logic as PRI >> interrupt in your patch. >> What do you think about that? > What you're describing is the SMMU stall model for platform devices. From > the driver perspective it's the same as PRI and PASID (SubstreamID=PASID). > Indeed! > When a stall-capable device accesses unmapped memory, the SMMU parks the > transaction and sends an event marked "stall" on the event queue, with a > stall tag (STAG, roughly equivalent to PRG Index). The OS handles the > fault and sends a CMD_RESUME command with the status and the STAG. Then > the SMMU completes the access or terminates it. > > In a prototype I have, the stall implementation reuses most of the Glad to hear that. Would you mind to share me the prototype patch? > PASID/PRI code. The main difficulty is defining SSID and stall capability > in firmware, as there are no standard capability probing for platform > devices. Stall-capable devices must be able to wait an indefinite amount > of time that their DMA transactions returns, therefore the stall model > cannot work with PCI, only some integrated devices. > I happen to have a board with such devices and like to do the test. Will re-post a full version patch upstream once completing the verification. Thanks, Bob Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] iommu: prevent VMD child devices from being remapping targets
VMD child devices must use the VMD endpoint's ID as the DMA source. Because of this, there needs to be a way to link the parent VMD endpoint's DMAR domain to the VMD child devices' DMAR domain such that attaching and detaching child devices modify the endpoint's DMAR mapping and prevents early detaching. This is outside the scope of VMD, so disable binding child devices to prevent unforeseen issues. This functionality may be implemented in the future. This patch prevents VMD child devices from returning an IOMMU, which prevents it from exposing iommu_group sysfs directories and subsequent binding by userspace-access drivers such as VFIO. Signed-off-by: Jon Derrick--- drivers/iommu/intel-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 687f18f..651a6cd 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -905,6 +905,11 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf * the PF instead to find the IOMMU. */ pf_pdev = pci_physfn(pdev); dev = _pdev->dev; + + /* VMD child devices currently cannot be handled individually */ + if (pci_bus_is_vmd(pdev->bus)) + return NULL; + segment = pci_domain_nr(pdev->bus); } else if (has_acpi_companion(dev)) dev = _COMPANION(dev)->dev; -- 2.9.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] MAINTAINERS: Add Jonathan Derrick as VMD maintainer
Add myself as VMD maintainer Signed-off-by: Jon Derrick--- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index f66488d..3ec39df 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10090,6 +10090,7 @@ F: drivers/pci/dwc/*imx6* PCI DRIVER FOR INTEL VOLUME MANAGEMENT DEVICE (VMD) M: Keith Busch +M: Jonathan Derrick L: linux-...@vger.kernel.org S: Supported F: drivers/pci/host/vmd.c -- 2.9.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] pci: Generalize is_vmd behavior
Generalize is_vmd behavior to remove dependency on domain number checking in pci quirks. Signed-off-by: Jon Derrick--- arch/x86/include/asm/pci.h | 8 +++- arch/x86/pci/common.c | 2 +- drivers/pci/quirks.c | 2 +- include/linux/pci.h| 4 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index 473a729..5c5d54a 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -60,16 +60,14 @@ static inline void *_pci_root_bus_fwnode(struct pci_bus *bus) #define pci_root_bus_fwnode_pci_root_bus_fwnode #endif -static inline bool is_vmd(struct pci_bus *bus) -{ #if IS_ENABLED(CONFIG_VMD) +static inline bool pci_bus_is_vmd(struct pci_bus *bus) +{ struct pci_sysdata *sd = bus->sysdata; return sd->vmd_domain; -#else - return false; -#endif } +#endif /* Can be used to override the logic in pci_scan_bus for skipping already-configured bus numbers - to be used for buggy BIOSes diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index dbe2132..18b2277 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -662,7 +662,7 @@ static void set_dma_domain_ops(struct pci_dev *pdev) {} static void set_dev_domain_options(struct pci_dev *pdev) { - if (is_vmd(pdev->bus)) + if (pci_bus_is_vmd(pdev->bus)) pdev->hotplug_user_indicators = 1; } diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6967c6b..ba47995 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4666,7 +4666,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap); static void quirk_no_aersid(struct pci_dev *pdev) { /* VMD Domain */ - if (pdev->bus->sysdata && pci_domain_nr(pdev->bus) >= 0x1) + if (pci_bus_is_vmd(pdev->bus)) pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID; } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2030, quirk_no_aersid); diff --git a/include/linux/pci.h b/include/linux/pci.h index 4869e66..0299d8b 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1471,6 +1471,10 @@ static inline int pci_proc_domain(struct pci_bus *bus) { return 0; } static inline int pci_get_new_domain_nr(void) { return -ENOSYS; } #endif /* CONFIG_PCI_DOMAINS */ +#if !IS_ENABLED(CONFIG_VMD) +static inline bool pci_bus_is_vmd(struct pci_bus *bus) { return false; } +#endif + /* * Generic implementation for PCI domain support. If your * architecture does not need custom management of PCI -- 2.9.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[CFP] VFIO/IOMMU/PCI microconference at LPC 2017
Hi, following the official LPC17 VFIO/IOMMU/PCI uconf acceptance notification: https://www.linuxplumbersconf.org/2017/vfioiommupci-microconference-accepted-into-linux-plumbers-conference/ I am sending out a call for sessions proposals open to all developers interested/involved in Linux kernel VFIO/IOMMU/PCI development. The LPC17 uconf wiki provides a list of topics that we put forward for the microconference submission: http://wiki.linuxplumbersconf.org/2017:vfio_iommu_pci The wiki is there to provide a list of topics that we considered key but it should not be considered final, actually it is a starting point to define a possible schedule structure. Session proposals for the LPC17 VFIO/IOMMU/PCI microconference are warmly encouraged and can be submitted here: https://linuxplumbersconf.org/2017/ocw/events/LPC2017/tracks/636 Anyone involved in VFIO/IOMMU/PCI kernel development, if you wish to add sessions and attend the microconference consider yourself welcome, for any questions just reply to this thread or drop me a line. Looking forward to meeting you all in Los Angeles ! Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation helper
On Mon, Aug 07, 2017 at 08:21:40AM +, Shameerali Kolothum Thodi wrote: > > > > -Original Message- > > From: Robin Murphy [mailto:robin.mur...@arm.com] > > Sent: Friday, August 04, 2017 5:57 PM > > To: Shameerali Kolothum Thodi; lorenzo.pieral...@arm.com; > > marc.zyng...@arm.com; sudeep.ho...@arm.com; will.dea...@arm.com; > > hanjun@linaro.org > > Cc: Gabriele Paoloni; John Garry; iommu@lists.linux-foundation.org; linux- > > arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > > de...@acpica.org; Linuxarm; Wangzhou (B); Guohanjun (Hanjun Guo) > > Subject: Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation > > helper > > > > On 01/08/17 11:49, Shameer Kolothum wrote: > > > On some platforms ITS address regions have to be excluded from normal > > > IOVA allocation in that they are detected and decoded in a HW specific > > > way by system components and so they cannot be considered normal > > IOVA > > > address space. > > > > > > Add an helper function that retrieves ITS address regions through IORT > > > device <-> ITS mappings and reserves it so that these regions will not > > > be translated by IOMMU and will be excluded from IOVA allocations. > > > > I've just realised that we no longer seem to have a check that ensures > > the regions are only reserved on platforms that need it - if not, then > > we're going to break everything else that does have an ITS behind SMMU > > translation as expected. > > Right. I had this doubt, but then my thinking was that we will have the SW_MSI > regions for those and will end up using that. But that doesn’t seems > to be the case now. > > > It feels like IORT should know enough to be able to make that decision > > internally, but if not (or if it would be hideous to do so), then I > > guess my idea for patch #2 was a bust and we probably do need to go back > > to calling directly from the SMMU driver based on the SMMU model. > > It might be possible to do that check inside iort code, but then we have to > find > the smmu node first and check the model number. I think it will be more > cleaner if SMMU driver makes that check and call the > iommu_dma_get_msi_resv_regions(). +1 on this one - we can do it in IORT but I think it is more logical to have a flag in the SMMU driver (keeping the DT/ACPI fwnode switch in generic IOMMU layer though). Side note I: AFAICS iommu_dma_get_resv_regions() is only used on ARM, if it is not patch 2 would break miserably on arches that do not select IORT - you should rework the return code on !CONFIG_ACPI_IORT configs. Side note II(nit): why not call the function iort_iommu_get_resv_regions() ? Lorenzo > If you are Ok with that I will quickly rework and send out the next version. > > Thanks, > Shameer > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Support SVM without PASID
Hi Bob, On 07/08/17 13:18, Bob Liu wrote: > On 2017/8/7 18:31, Jean-Philippe Brucker wrote: >> On 05/08/17 06:14, valmiki wrote: >> [...] >>> Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel >>> documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map >>> and pass them to VFIO. But if we use dynamic allocation in application >>> (say malloc), do we need to use dma API to get iova and then call >>> VFIO_IOMMU_MAP ioctl ? >>> If application needs multiple such dynamic allocations, then it need to >>> allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then >>> manage rest allocations requirements from this buffer ? >> >> Yes, without SVM, the application allocates large buffers, allocates IOVAs >> itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the >> DMA API at all, it manages IOVAs as it wants. Sizes passed to >> VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity >> (that is at least 4kB), and both iova and vaddr have to be aligned on that >> granularity as well. So malloc isn't really suitable in this case, you'll >> need mmap. The application can then implement a small allocator to manage >> the DMA pool created with VFIO_IOMMU_MAP. >> >> With SVM the application binds its address space to the device, and then >> uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP. >> > > Hi Jean, > > I think there is another way to support SVM without PASID. > > Suppose there is a device in the same SOC-chip, the device access memory > through SMMU(using internal bus instead of PCIe) > Once page fault, the device send an event with (vaddr, substreamID) to SMMU, > then SMMU triggers an event interrupt. > > In the event interrupt handler, we can implement the same logic as PRI > interrupt in your patch. > What do you think about that? What you're describing is the SMMU stall model for platform devices. From the driver perspective it's the same as PRI and PASID (SubstreamID=PASID). When a stall-capable device accesses unmapped memory, the SMMU parks the transaction and sends an event marked "stall" on the event queue, with a stall tag (STAG, roughly equivalent to PRG Index). The OS handles the fault and sends a CMD_RESUME command with the status and the STAG. Then the SMMU completes the access or terminates it. In a prototype I have, the stall implementation reuses most of the PASID/PRI code. The main difficulty is defining SSID and stall capability in firmware, as there are no standard capability probing for platform devices. Stall-capable devices must be able to wait an indefinite amount of time that their DMA transactions returns, therefore the stall model cannot work with PCI, only some integrated devices. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautamwrote: > On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark wrote: >> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R >> wrote: >>> Hi Vivek, >>> >>> On 7/13/2017 10:43 AM, Vivek Gautam wrote: Hi Stephen, On 07/13/2017 04:24 AM, Stephen Boyd wrote: > On 07/06, Vivek Gautam wrote: >> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain >> *domain, unsigned long iova, >> static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned >> long iova, >>size_t size) >> { >> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; >> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; >> +size_t ret; >> if (!ops) >> return 0; >> -return ops->unmap(ops, iova, size); >> +pm_runtime_get_sync(smmu_domain->smmu->dev); > Can these map/unmap ops be called from an atomic context? I seem > to recall that being a problem before. That's something which was dropped in the following patch merged in master: 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock Looks like we don't need locks here anymore? >>> >>> Apart from the locking, wonder why a explicit pm_runtime is needed >>> from unmap. Somehow looks like some path in the master using that >>> should have enabled the pm ? >>> >> >> Yes, there are a bunch of scenarios where unmap can happen with >> disabled master (but not in atomic context). > > I would like to understand whether there is a situation where an unmap is > called in atomic context without an enabled master? > > Let's say we have the case where all the unmap calls in atomic context happen > only from the master's context (in which case the device link should > take care of > the pm state of smmu), and the only unmap that happen in non-atomic context > is the one with master disabled. In such a case doesn it make sense to > distinguish > the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only > for the non-atomic context since that would be the one with master disabled. > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it won't unmap anything in atomic ctx (but it can unmap w/ master disabled). I can't really comment about other non-gpu drivers. It seems like a reasonable constraint that either master is enabled or not in atomic ctx. Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd like to drop that to avoid powering up the gpu. BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Support SVM without PASID
On 2017/8/7 18:31, Jean-Philippe Brucker wrote: > On 05/08/17 06:14, valmiki wrote: > [...] >> Hi Jean, Thanks a lot, now i understood the flow. From vfio kernel >> documentation we fill vaddr and iova in struct vfio_iommu_type1_dma_map >> and pass them to VFIO. But if we use dynamic allocation in application >> (say malloc), do we need to use dma API to get iova and then call >> VFIO_IOMMU_MAP ioctl ? >> If application needs multiple such dynamic allocations, then it need to >> allocate large chunk and program it via VFIO_IOMMU_MAP ioctl and then >> manage rest allocations requirements from this buffer ? > > Yes, without SVM, the application allocates large buffers, allocates IOVAs > itself, and maps them with VFIO_IOMMU_MAP. Userspace doesn't rely on the > DMA API at all, it manages IOVAs as it wants. Sizes passed to > VFIO_IOMMU_MAP have to be multiples of the MMU or IOMMU page granularity > (that is at least 4kB), and both iova and vaddr have to be aligned on that > granularity as well. So malloc isn't really suitable in this case, you'll > need mmap. The application can then implement a small allocator to manage > the DMA pool created with VFIO_IOMMU_MAP. > > With SVM the application binds its address space to the device, and then > uses malloc for all DMA buffers, no need for VFIO_IOMMU_MAP. > Hi Jean, I think there is another way to support SVM without PASID. Suppose there is a device in the same SOC-chip, the device access memory through SMMU(using internal bus instead of PCIe) Once page fault, the device send an event with (vaddr, substreamID) to SMMU, then SMMU triggers an event interrupt. In the event interrupt handler, we can implement the same logic as PRI interrupt in your patch. What do you think about that? Thanks, Bob Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Jul 13, 2017 at 5:20 PM, Rob Clarkwrote: > On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R wrote: >> Hi Vivek, >> >> On 7/13/2017 10:43 AM, Vivek Gautam wrote: >>> Hi Stephen, >>> >>> >>> On 07/13/2017 04:24 AM, Stephen Boyd wrote: On 07/06, Vivek Gautam wrote: > @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain > *domain, unsigned long iova, > static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long > iova, >size_t size) > { > -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; > +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; > +size_t ret; > if (!ops) > return 0; > -return ops->unmap(ops, iova, size); > +pm_runtime_get_sync(smmu_domain->smmu->dev); Can these map/unmap ops be called from an atomic context? I seem to recall that being a problem before. >>> >>> That's something which was dropped in the following patch merged in master: >>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock >>> >>> Looks like we don't need locks here anymore? >> >> Apart from the locking, wonder why a explicit pm_runtime is needed >> from unmap. Somehow looks like some path in the master using that >> should have enabled the pm ? >> > > Yes, there are a bunch of scenarios where unmap can happen with > disabled master (but not in atomic context). I would like to understand whether there is a situation where an unmap is called in atomic context without an enabled master? Let's say we have the case where all the unmap calls in atomic context happen only from the master's context (in which case the device link should take care of the pm state of smmu), and the only unmap that happen in non-atomic context is the one with master disabled. In such a case doesn it make sense to distinguish the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only for the non-atomic context since that would be the one with master disabled. Thanks Vivek > On the gpu side we > opportunistically keep a buffer mapping until the buffer is freed > (which can happen after gpu is disabled). Likewise, v4l2 won't unmap > an exported dmabuf while some other driver holds a reference to it > (which can be dropped when the v4l2 device is suspended). > > Since unmap triggers tbl flush which touches iommu regs, the iommu > driver *definitely* needs a pm_runtime_get_sync(). > > BR, > -R > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation helper
> -Original Message- > From: Robin Murphy [mailto:robin.mur...@arm.com] > Sent: Friday, August 04, 2017 5:57 PM > To: Shameerali Kolothum Thodi; lorenzo.pieral...@arm.com; > marc.zyng...@arm.com; sudeep.ho...@arm.com; will.dea...@arm.com; > hanjun@linaro.org > Cc: Gabriele Paoloni; John Garry; iommu@lists.linux-foundation.org; linux- > arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > de...@acpica.org; Linuxarm; Wangzhou (B); Guohanjun (Hanjun Guo) > Subject: Re: [PATCH v5 1/2] ACPI/IORT: Add ITS address regions reservation > helper > > On 01/08/17 11:49, Shameer Kolothum wrote: > > On some platforms ITS address regions have to be excluded from normal > > IOVA allocation in that they are detected and decoded in a HW specific > > way by system components and so they cannot be considered normal > IOVA > > address space. > > > > Add an helper function that retrieves ITS address regions through IORT > > device <-> ITS mappings and reserves it so that these regions will not > > be translated by IOMMU and will be excluded from IOVA allocations. > > I've just realised that we no longer seem to have a check that ensures > the regions are only reserved on platforms that need it - if not, then > we're going to break everything else that does have an ITS behind SMMU > translation as expected. Right. I had this doubt, but then my thinking was that we will have the SW_MSI regions for those and will end up using that. But that doesn’t seems to be the case now. > It feels like IORT should know enough to be able to make that decision > internally, but if not (or if it would be hideous to do so), then I > guess my idea for patch #2 was a bust and we probably do need to go back > to calling directly from the SMMU driver based on the SMMU model. It might be possible to do that check inside iort code, but then we have to find the smmu node first and check the model number. I think it will be more cleaner if SMMU driver makes that check and call the iommu_dma_get_msi_resv_regions(). If you are Ok with that I will quickly rework and send out the next version. Thanks, Shameer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 5/5] vfio-pci: Allow to expose MSI-X table to userspace when safe
Some devices have a MSIX BAR not aligned to the system page size greater than 4K (like 64k for ppc64) which at the moment prevents such MMIO pages from being mapped to the userspace for the sake of the MSIX BAR content protection. If such page happens to share the same system page with some frequently accessed registers, the entire system page will be emulated which can seriously affect performance. This allows mapping of MSI-X tables to userspace if hardware provides MSIX isolation via interrupt remapping or filtering; in other words allowing direct access to the MSIX BAR won't do any harm to other devices or cause spurious interrupts visible to the kernel. This adds a wrapping helper to check if a capability is supported by an IOMMU group. Signed-off-by: Alexey Kardashevskiy--- include/linux/vfio.h | 1 + drivers/vfio/pci/vfio_pci.c | 20 +--- drivers/vfio/pci/vfio_pci_rdwr.c | 5 - drivers/vfio/vfio.c | 15 +++ 4 files changed, 37 insertions(+), 4 deletions(-) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 586809abb273..7110bca2fb60 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -46,6 +46,7 @@ struct vfio_device_ops { extern struct iommu_group *vfio_iommu_group_get(struct device *dev); extern void vfio_iommu_group_put(struct iommu_group *group, struct device *dev); +extern bool vfio_iommu_group_is_capable(struct device *dev, unsigned long cap); extern int vfio_add_group_dev(struct device *dev, const struct vfio_device_ops *ops, diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index d87a0a3cda14..c4c39ed64b1e 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -561,11 +561,17 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev, struct vfio_region_info_cap_sparse_mmap *sparse; size_t end, size; int nr_areas = 2, i = 0, ret; + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev, + IOMMU_GROUP_CAP_ISOLATE_MSIX); end = pci_resource_len(vdev->pdev, vdev->msix_bar); - /* If MSI-X table is aligned to the start or end, only one area */ - if (((vdev->msix_offset & PAGE_MASK) == 0) || + /* +* If MSI-X table is allowed to mmap because of the capability +* of IRQ remapping or aligned to the start or end, only one area +*/ + if (is_msix_isolated || + ((vdev->msix_offset & PAGE_MASK) == 0) || (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end)) nr_areas = 1; @@ -577,6 +583,12 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev, sparse->nr_areas = nr_areas; + if (is_msix_isolated) { + sparse->areas[i].offset = 0; + sparse->areas[i].size = end; + return 0; + } + if (vdev->msix_offset & PAGE_MASK) { sparse->areas[i].offset = 0; sparse->areas[i].size = vdev->msix_offset & PAGE_MASK; @@ -1094,6 +1106,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) unsigned int index; u64 phys_len, req_len, pgoff, req_start; int ret; + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev, + IOMMU_GROUP_CAP_ISOLATE_MSIX); index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); @@ -1115,7 +1129,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (index == vdev->msix_bar && !is_msix_isolated) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 357243d76f10..7514206a5ea7 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "vfio_pci_private.h" @@ -123,6 +124,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf, resource_size_t end; void __iomem *io; ssize_t done; + bool is_msix_isolated = vfio_iommu_group_is_capable(>pdev->dev, + IOMMU_GROUP_CAP_ISOLATE_MSIX); if (pci_resource_start(pdev, bar)) end = pci_resource_len(pdev, bar); @@ -164,7 +167,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf, } else io = vdev->barmap[bar]; - if (bar == vdev->msix_bar) { + if (bar == vdev->msix_bar && !is_msix_isolated) { x_start = vdev->msix_offset; x_end = vdev->msix_offset + vdev->msix_size; }
Re: [PATCH] iommu/arm-smmu: Defer TLB flush in case of unmap op
Hi Robin, On Fri, Aug 4, 2017 at 10:34 PM, Robin Murphywrote: > On 03/08/17 06:35, Vivek Gautam wrote: >> Hi Robin, >> >> >> >> On 08/02/2017 05:47 PM, Robin Murphy wrote: >>> On 02/08/17 10:53, Vivek Gautam wrote: We don't want to touch the TLB when smmu is suspended. Defer it until resume. Signed-off-by: Vivek Gautam --- Hi all, Here's the small patch in response of suggestion to defer tlb operations when smmu is in suspend state. The patch stores the TLB requests in 'unmap' when the smmu device is suspended. On resume, it checks all the pending TLB requests, and performs the unmap over those. Right now, I have applied the patch on top of the pm runtime series. Let me know what you think of the change. It will also be helpful if somebody can please test a valid use case with this. >>> The patch itself doesn't make much sense to me, but more crucially it's >>> definitely broken in concept. We can't return from arm_smmu_unmap() >>> without having actually unmapped anything, because that leaves the page >>> tables out of sync with what the caller expects - they may immmediately >>> reuse that IOVA to map something else for a different device and hit an >>> unexpected failure from io-pgtable when the PTE turns out to be >>> non-empty. >> >> To understand things bit more, >> once we don't *unmap* in arm_smmu_unmap(), and leave the TLBs as is, >> the next mapping can happen only with the *knowledge* of smmu, i.e., >> smmu should be active at that time. >> If that's true then, the _runtime()_resume() method will take care of >> invalidating the TLBs when we call arm_smmu_unmap() from _runtime_resume(). >> Is my understanding correct here? > > What I mean is that it's OK for arm_smmu_unmap() to defer the physical > TLB maintenance for an unmap request if the SMMU is suspended, but it > *must* still update the pagetable so that the given address is logically > unmapped before returning. In other words, the place to make decisions > based on the SMMU PM state would be in the .tlb_add_flush and .tlb_sync > callbacks, rather than at the top level. Okay, i understand it better now. .tlb_add_flush and .tlb_sync callbacks should be the right place. > >>> However, if in general suspend *might* power-gate any part of the SMMU, >>> then I don't think we have any guarantee of what state any TLBs could be >>> in upon resume. Therefore any individual invalidations we skip while >>> suspended are probably moot, since resume would almost certainly have to >>> invalidate everything to get back to a safe state anyway. >> >> Right, in case when the suspend power-gates the SMMU, then >> the TLB context is lost anyways. So resume path can freshly start. >> This is something that exynos does at present. > > Yes, in general I don't think we can assume any SMMU state is preserved, > so the only safe option would be for .runtime_resume to do the same > thing as .resume, which does at least make things nice and simple. Let me try to find out more about the state of TLBs. As far as the programmable registers are concerned, qcom platforms have retention enabled for them. So they don't loose state after SMMU power down. > >>> Conversely though, the situation that still concerns me is whether this >>> can work at all for a distributed SMMU if things *don't* lose state. Say >>> the GPU and its local TBU are in the same clock domain - if the GPU has >>> just gone idle and we've clock-gated it, but "the SMMU" (i.e. the TCU) >>> is still active servicing other devices, we will assume we can happily >>> unmap GPU buffers and issue TLBIs, but what happens with entries held in >>> the unclocked TBU's micro-TLB? >> >> We know of platforms we have that have shared TCU and multiple TBUs. >> Each TBU is available in its own power domain, not in master's power >> domain. >> In such cases we may want to runtime_get() the TBUs, so that unmap() >> call with >> master clock gated gets through. >> >> Can we have a situation where the TBU and master are in the same power >> domain, and the unmap is called when the master is not runtime active? >> How will such a situation be handled? > > Having thought about it a bit more, I think the > unmap-after-master-suspended case is only one facet of the problem - if > we can power down individual TBUs/micro-TLBs without suspending the rest > of the SMMU, do we also have any guarantee that such TLBs don't power > back on full of valid-looking random junk? > > I'm starting to think the only way to be generally safe would be to > globally invalidate all TLBs after any *master* is resumed, and I'm not > even sure that's feasible :/ > > Robin. > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html regards Vivek -- Qualcomm
[RFC PATCH v5 4/5] powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX
This sets IOMMU_GROUP_CAP_ISOLATE_MSIX to a group unconditionally as there is no IOMMU-capable hardware without such a feature. On IBM POWERPC (POWER8) [1], PCI host bridge maintains BFDN-to-PE translation (PE stands for "partitionable endpoint"), and PE index is used to look at Interrupt Vector Table (IVT) to identify the interrupt server. Without these translations in place, MSIX messages won't pass PHB. [1] 3.2.4. MSI Design http://openpowerfoundation.org/wp-content/uploads/resources/IODA2Spec/IODA2WGSpec-1.0.0-20160217.pdf Signed-off-by: Alexey Kardashevskiy--- arch/powerpc/kernel/iommu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 233ca3fe4754..dca0a83f1560 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -936,6 +936,7 @@ void iommu_register_group(struct iommu_table_group *table_group, if (!name) return; iommu_group_set_name(grp, name); + iommu_group_set_caps(grp, 0, IOMMU_GROUP_CAP_ISOLATE_MSIX); kfree(name); } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 2/5] iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ remapping
This sets IOMMU_GROUP_CAP_ISOLATE_MSIX to a group if MSI remapping is enabled on an IRQ domain; this is expected to set the capability on ARM. Signed-off-by: Alexey Kardashevskiy--- drivers/iommu/iommu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 6b2c34fe2c3d..e720e90fa93c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -32,6 +32,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -1028,6 +1029,7 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev) const struct iommu_ops *ops = dev->bus->iommu_ops; struct iommu_group *group; int ret; + struct irq_domain *d = dev_get_msi_domain(dev); group = iommu_group_get(dev); if (group) @@ -1070,6 +1072,11 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev) return ERR_PTR(ret); } + if (d && irq_domain_is_msi(d) && + irq_domain_hierarchical_is_msi_remap(d)) + iommu_group_set_caps(group, 0, + IOMMU_GROUP_CAP_ISOLATE_MSIX); + return group; } -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 1/5] iommu: Add capabilities to a group
This introduces capabilities to IOMMU groups. The first defined capability is IOMMU_GROUP_CAP_ISOLATE_MSIX which tells the IOMMU group users that a particular IOMMU group is capable of MSIX message filtering; this is useful when deciding whether or not to allow mapping of MSIX table to the userspace. Various architectures will enable it when they decide that it is safe to do so. Signed-off-by: Alexey Kardashevskiy--- include/linux/iommu.h | 20 drivers/iommu/iommu.c | 28 2 files changed, 48 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2cb54adc4a33..6b6f3c2f4904 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -155,6 +155,9 @@ struct iommu_resv_region { enum iommu_resv_typetype; }; +/* IOMMU group capabilities */ +#define IOMMU_GROUP_CAP_ISOLATE_MSIX (1U) + #ifdef CONFIG_IOMMU_API /** @@ -312,6 +315,11 @@ extern void *iommu_group_get_iommudata(struct iommu_group *group); extern void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data, void (*release)(void *iommu_data)); +extern void iommu_group_set_caps(struct iommu_group *group, +unsigned long clearcaps, +unsigned long setcaps); +extern bool iommu_group_is_capable(struct iommu_group *group, + unsigned long cap); extern int iommu_group_set_name(struct iommu_group *group, const char *name); extern int iommu_group_add_device(struct iommu_group *group, struct device *dev); @@ -513,6 +521,18 @@ static inline void iommu_group_set_iommudata(struct iommu_group *group, { } +static inline void iommu_group_set_caps(struct iommu_group *group, + unsigned long clearcaps, + unsigned long setcaps) +{ +} + +static inline bool iommu_group_is_capable(struct iommu_group *group, + unsigned long cap) +{ + return false; +} + static inline int iommu_group_set_name(struct iommu_group *group, const char *name) { diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3f6ea160afed..6b2c34fe2c3d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -52,6 +52,7 @@ struct iommu_group { void (*iommu_data_release)(void *iommu_data); char *name; int id; + unsigned long caps; struct iommu_domain *default_domain; struct iommu_domain *domain; }; @@ -447,6 +448,33 @@ void iommu_group_set_iommudata(struct iommu_group *group, void *iommu_data, EXPORT_SYMBOL_GPL(iommu_group_set_iommudata); /** + * iommu_group_set_caps - Change the group capabilities + * @group: the group + * @clearcaps: capabilities mask to remove + * @setcaps: capabilities mask to add + * + * IOMMU groups can be capable of various features which device drivers + * may read and adjust the behavior. + */ +void iommu_group_set_caps(struct iommu_group *group, + unsigned long clearcaps, unsigned long setcaps) +{ + group->caps &= ~clearcaps; + group->caps |= setcaps; +} +EXPORT_SYMBOL_GPL(iommu_group_set_caps); + +/** + * iommu_group_is_capable - Returns if a group capability is present + * @group: the group + */ +bool iommu_group_is_capable(struct iommu_group *group, unsigned long cap) +{ + return !!(group->caps & cap); +} +EXPORT_SYMBOL_GPL(iommu_group_is_capable); + +/** * iommu_group_set_name - set name for a group * @group: the group * @name: name -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table
This is a followup for "[PATCH kernel v4 0/6] vfio-pci: Add support for mmapping MSI-X table" http://www.spinics.net/lists/kvm/msg152232.html This time it is using "caps" in IOMMU groups. The main question is if PCI bus flags or IOMMU domains are still better (and which one). Here is some background: Current vfio-pci implementation disallows to mmap the page containing MSI-X table in case that users can write directly to MSI-X table and generate an incorrect MSIs. However, this will cause some performance issue when there are some critical device registers in the same page as the MSI-X table. We have to handle the mmio access to these registers in QEMU emulation rather than in guest. To solve this issue, this series allows to expose MSI-X table to userspace when hardware enables the capability of interrupt remapping which can ensure that a given PCI device can only shoot the MSIs assigned for it. And we introduce a new bus_flags PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side for different archs. This is based on sha1 26c5cebfdb6c "Merge branch 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux" Please comment. Thanks. Changelog: v5: * redid the whole thing via so-called IOMMU group capabilities v4: * rebased on recent upstream * got all 6 patches from v2 (v3 was missing some) Alexey Kardashevskiy (5): iommu: Add capabilities to a group iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ remapping iommu/intel/amd: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if IRQ remapping is enabled powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX vfio-pci: Allow to expose MSI-X table to userspace when safe include/linux/iommu.h| 20 include/linux/vfio.h | 1 + arch/powerpc/kernel/iommu.c | 1 + drivers/iommu/amd_iommu.c| 3 +++ drivers/iommu/intel-iommu.c | 3 +++ drivers/iommu/iommu.c| 35 +++ drivers/vfio/pci/vfio_pci.c | 20 +--- drivers/vfio/pci/vfio_pci_rdwr.c | 5 - drivers/vfio/vfio.c | 15 +++ 9 files changed, 99 insertions(+), 4 deletions(-) -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu