[PATCH AUTOSEL 5.14 29/40] swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests
From: Jan Beulich [ Upstream commit 9074c79b62b6e0d91d7f716c6e4e9968eaf9e043 ] While the hypervisor hasn't been enforcing this, we would still better avoid issuing requests with GFNs not aligned to the requested order. Instead of altering the value also in the call to panic(), drop it there for being static and hence easy to determine without being part of the panic message. Signed-off-by: Jan Beulich Reviewed-by: Stefano Stabellini Link: https://lore.kernel.org/r/7b3998e3-1233-4e5a-89ec-d740e77eb...@suse.com Signed-off-by: Juergen Gross Signed-off-by: Sasha Levin --- drivers/xen/swiotlb-xen.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index dbb18dc956f3..de4f55154d49 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -232,10 +232,11 @@ void __init xen_swiotlb_init_early(void) /* * Get IO TLB memory from any location. */ - start = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE); + start = memblock_alloc(PAGE_ALIGN(bytes), + IO_TLB_SEGSIZE << IO_TLB_SHIFT); if (!start) - panic("%s: Failed to allocate %lu bytes align=0x%lx\n", - __func__, PAGE_ALIGN(bytes), PAGE_SIZE); + panic("%s: Failed to allocate %lu bytes\n", + __func__, PAGE_ALIGN(bytes)); /* * And replace that memory with pages under 4GB. -- 2.33.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver
On Mon, Sep 27, 2021 at 10:26:43PM +0800, Tianyu Lan wrote: > Hi Christoph: > Gentile ping. The swiotlb and shared memory mapping changes in this > patchset needs your reivew. Could you have a look? I'm a little too busy for a review of such a huge patchset right now. That being said here are my comments from a very quick review: - the bare memremap usage in swiotlb looks strange and I'd definitively expect a well documented wrapper. - given that we can now hand out swiotlb memory for coherent mappings we need to carefully audit what happens when this memremaped memory gets mmaped or used through dma_get_sgtable - the netscv changes I'm not happy with at all. A large part of it is that the driver already has a bad structure, but this series is making it significantly worse. We'll need to find a way to use the proper dma mapping abstractions here. One option if you want to stick to the double vmapped buffer would be something like using dma_alloc_noncontigous plus a variant of dma_vmap_noncontiguous that takes the shared_gpa_boundary into account. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/mediatek: fix out-of-range warning with clang
On Mon, 2021-09-27 at 14:18 +0200, Arnd Bergmann wrote: > From: Arnd Bergmann > > clang-14 notices that a comparison is never true when > CONFIG_PHYS_ADDR_T_64BIT is disabled: > > drivers/iommu/mtk_iommu.c:553:34: error: result of comparison of > constant 5368709120 with expression of type 'phys_addr_t' (aka > 'unsigned int') is always false [-Werror,-Wtautological-constant-out- > of-range-compare] > if (dom->data->enable_4GB && pa >= > MTK_IOMMU_4GB_MODE_REMAP_BASE) > ~~ > ^ ~ > > Add an explicit check for the type of the variable to skip the check > and the warning in that case. > > Fixes: b4dad40e4f35 ("iommu/mediatek: Adjust the PA for the 4GB > Mode") > Signed-off-by: Arnd Bergmann Reviewed-by: Yong Wu Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On 9/27/21 2:02 PM, Luck, Tony wrote: > Or are you thinking of a helper that does both the check > and the update ... so the code here could be: > > update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val)); > > With the helper being something like: > > void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size) > { > if (xsave_state_in_memory(args ...)) { > addr = get_xsave_addr(xsave, xfeature); > memcpy(addr, data, size); > xsave->header.xfeatures |= (1 << xfeature); > return; > } > > switch (xfeature) { > case XFEATURE_PASID: > wrmsrl(MSR_IA32_PASID, *(u64 *)data); > break; > > case each_of_the_other_XFEATURE_enums: > code to update registers for that XFEATURE > } > } > > either way needs the definitive correct coding for xsave_state_in_memory() That's close to what we want. The size should probably be implicit. If it isn't implicit, it needs to at least be double-checked against the state sizes. Not to get too fancy, but I think we also want to have a "replace" operation which is separate from the "update". Think of a case where you are trying to set a bit: struct pkru_state *pk = start_update_xstate(tsk, XSTATE_PKRU); pk->pkru |= 0x100; finish_update_xstate(tsk, XSTATE_PKRU, pk); versus setting a whole new value: struct pkru_state *pk = start_replace_xstate(tsk, XSTATE_PKRU); memset(pkru, sizeof(*pk), 0); pk->pkru = 0x1234; finish_replace_xstate(tsk, XSTATE_PKRU, pk); They look similar, but they actually might have very different costs for big states like AMX. We can also do some very different debugging for them. In normal operation, these _can_ just return pointers directly to the fpu...->xstate in some case. But, if we're debugging, we could kmalloc() a buffer and do sanity checking before it goes into the task buffer. We don't have to do any of that fancy stuff now. We don't even need to do an "update" if all we need for now for XFEATURE_PASID is a "replace". 1. Hide whether we need to write to real registers 2. Hide whether we need to update the in-memory image 3. Hide other FPU infrastructure like the TIF flag. 4. Make the users deal with a *whole* state in the replace API ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/9] cxl: Convert "RBI" to enum
Please spell out "register block indicator" in the subject so that the shortlog remains somewhat readable. On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky wrote: > > In preparation for passing around the Register Block Indicator (RBI) as > a parameter, it is desirable to convert the type to an enum so that the > interface can use a well defined type checked parameter. C wouldn't type check this unless it failed an integer conversion, right? It would need to be a struct to get useful type checking. I don't mind this for the self documenting properties it has for the functions that will take this as a parameter, but maybe clarify what you mean by type checked parameter? > > As a result of this change, should future versions of the spec add > sparsely defined identifiers, it could become a problem if checking for > invalid identifiers since the code currently checks for the max > identifier. This is not an issue with current spec, and the algorithm to > obtain the register blocks will change before those possible additions > are made. In general let's not spend changelog space trying to guess what future specs may or may not do. I.e. I think this text can be dropped, especially because enums can support sparse number spaces. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, > This is based on Jean-Philippe's > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 > https://www.spinics.net/lists/arm-kernel/msg886518.html > (including the patches that were not pulled for 5.13) > Jean's patches have been merged to v5.14. Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon? -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP
On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote: > > + fpregs_lock(); > > + > > + /* > > +* If the task's FPU doesn't need to be loaded or is valid, directly > > +* write the IA32_PASID MSR. Otherwise, write the PASID state and > > +* the MSR will be loaded from the PASID state before returning to > > +* the user. > > +*/ > > + if (!test_thread_flag(TIF_NEED_FPU_LOAD) || > > + fpregs_state_valid(fpu, smp_processor_id())) { > > + wrmsrl(MSR_IA32_PASID, msr_val); > > Let me try to decode this. > > If the current task's FPU state is live or if the state is in memory but the > CPU regs just happen to match the copy in memory, then write the MSR. Else > write the value to memory. > > This is wrong. If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT > MODIFY FPU STATE. This is not negotiable -- you will break coherence between > CPU regs and the memory image. The way you modify the current task's state > is either you modify it in CPU regs (if the kernel knows that the CPU regs > are the one and only source of truth) OR you modify it in memory and > invalidate any preserved copies (by zapping last_cpu). > > In any event, that particular bit of logic really doesn't belong in here -- > it belongs in some helper that gets it right, once. Andy, A helper sounds like a good idea. Can you flesh out what you would like that to look like? Is it just the "where is the live register state?" so the above could be written: if (xsave_state_in_memory(args ...)) update pasid bit of xsave state in memory else wrmsrl(MSR_IA32_PASID, msr_val); Or are you thinking of a helper that does both the check and the update ... so the code here could be: update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val)); With the helper being something like: void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size) { if (xsave_state_in_memory(args ...)) { addr = get_xsave_addr(xsave, xfeature); memcpy(addr, data, size); xsave->header.xfeatures |= (1 << xfeature); return; } switch (xfeature) { case XFEATURE_PASID: wrmsrl(MSR_IA32_PASID, *(u64 *)data); break; case each_of_the_other_XFEATURE_enums: code to update registers for that XFEATURE } } either way needs the definitive correct coding for xsave_state_in_memory() -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol
On Mon, Sep 27, 2021 at 10:42 PM Bjorn Andersson wrote: > On Mon 27 Sep 13:15 PDT 2021, Arnd Bergmann wrote: > > On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson > > wrote: > > > > An easier option might be to find a way to build QCOM_SCM without > > RESET_CONTROLLER for compile testing purposes. I don't know > > what would break from that. > > > > Afaict the reset API is properly stubbed and RESET_CONTROLLER is a bool, > so I think we can simply drop the "select" and the kernel will still > compile fine in all combinations. > > When it comes to runtime, we currently select RESET_CONTROLLER from the > Qualcomm common clocks. If that is dropped (why would it...) it seems > possible to build a custom kernel for msm8916 that we can boot and miss > the stubbed out "mss restart" reset line from the SCM. > > > So, let's just drop the select RESET_CONTROLLER from SCM for now. Ok, I've made that change locally, giving it more time on the randconfig build box now. Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol
On Mon 27 Sep 13:15 PDT 2021, Arnd Bergmann wrote: > On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson > wrote: > > On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote: > > > From: Arnd Bergmann > > > > > > - To avoid a circular dependency chain involving RESET_CONTROLLER > > >and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in > > >the latter one to 'select'. > > > > Can you please help me understand why this is part of the same patch? > > This can be done as a preparatory patch if we decide to do it this way, > for the review it seemed better to spell out that this is required. > > I still hope that we can avoid adding another 'select RESET_CONTROLLER' > if someone can figure out what to do instead. > Okay, thanks. > The problem here is that QCOM_SCM selects RESET_CONTROLLER, > and turning that into 'depends on' would in turn mean that any driver that > wants to select QCOM_SCM would have to have the same RESET_CONTROLLER > dependency. > Right, and that will just be another thing we'll get wrong across the tree. > An easier option might be to find a way to build QCOM_SCM without > RESET_CONTROLLER for compile testing purposes. I don't know > what would break from that. > Afaict the reset API is properly stubbed and RESET_CONTROLLER is a bool, so I think we can simply drop the "select" and the kernel will still compile fine in all combinations. When it comes to runtime, we currently select RESET_CONTROLLER from the Qualcomm common clocks. If that is dropped (why would it...) it seems possible to build a custom kernel for msm8916 that we can boot and miss the stubbed out "mss restart" reset line from the SCM. So, let's just drop the select RESET_CONTROLLER from SCM for now. Regards, Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol
On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson wrote: > On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote: > > From: Arnd Bergmann > > > > - To avoid a circular dependency chain involving RESET_CONTROLLER > >and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in > >the latter one to 'select'. > > Can you please help me understand why this is part of the same patch? This can be done as a preparatory patch if we decide to do it this way, for the review it seemed better to spell out that this is required. I still hope that we can avoid adding another 'select RESET_CONTROLLER' if someone can figure out what to do instead. The problem here is that QCOM_SCM selects RESET_CONTROLLER, and turning that into 'depends on' would in turn mean that any driver that wants to select QCOM_SCM would have to have the same RESET_CONTROLLER dependency. An easier option might be to find a way to build QCOM_SCM without RESET_CONTROLLER for compile testing purposes. I don't know what would break from that. Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations
On 2021-09-27 12:53 p.m., Bjorn Helgaas wrote: > Acked-by: Bjorn Helgaas > > Ditto. Thanks Bjorn, I'll make these changes and add your Acks for subsequent postings. Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol
On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote: > From: Arnd Bergmann > > Now that SCM can be a loadable module, we have to add another > dependency to avoid link failures when ipa or adreno-gpu are > built-in: > > aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe': > ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available' > > ld.lld: error: undefined symbol: qcom_scm_is_available > >>> referenced by adreno_gpu.c > >>> gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in > >>> archive drivers/built-in.a > > This can happen when CONFIG_ARCH_QCOM is disabled and we don't select > QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd > use a similar dependency here to what we have for QCOM_RPROC_COMMON, > but that causes dependency loops from other things selecting QCOM_SCM. > > This appears to be an endless problem, so try something different this > time: > > - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on' >but that is simply selected by all of its users > > - All the stubs in include/linux/qcom_scm.h can go away > > - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to >allow compile-testing QCOM_SCM on all architectures. > > - To avoid a circular dependency chain involving RESET_CONTROLLER >and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in >the latter one to 'select'. Can you please help me understand why this is part of the same patch? > > The last bit is rather annoying, as drivers should generally never > 'select' another subsystem, and about half the users of the reset > controller interface do this anyway. > > Nevertheless, this version seems to pass all my randconfig tests > and is more robust than any of the prior versions. > > Comments? > I like it! It's less confusing than the current scheme, so should be easier to get right. And I like the fact that we don't need the stubs anymore. @John, could you please have a look at this as well, wrt GKI. Regards, Bjorn > Signed-off-by: Arnd Bergmann > --- > drivers/firmware/Kconfig| 4 +- > drivers/gpu/drm/msm/Kconfig | 4 +- > drivers/iommu/Kconfig | 2 +- > drivers/media/platform/Kconfig | 2 +- > drivers/mmc/host/Kconfig| 2 +- > drivers/net/ipa/Kconfig | 1 + > drivers/net/wireless/ath/ath10k/Kconfig | 2 +- > drivers/pinctrl/qcom/Kconfig| 3 +- > drivers/pinctrl/sunxi/Kconfig | 6 +-- > include/linux/arm-smccc.h | 10 > include/linux/qcom_scm.h| 71 - > 11 files changed, 23 insertions(+), 84 deletions(-) > > diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig > index 220a58cf0a44..f7dd82ef0b9c 100644 > --- a/drivers/firmware/Kconfig > +++ b/drivers/firmware/Kconfig > @@ -203,9 +203,7 @@ config INTEL_STRATIX10_RSU > Say Y here if you want Intel RSU support. > > config QCOM_SCM > - tristate "Qcom SCM driver" > - depends on ARM || ARM64 > - depends on HAVE_ARM_SMCCC > + tristate > select RESET_CONTROLLER > > config QCOM_SCM_DOWNLOAD_MODE_DEFAULT > diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig > index e9c6af78b1d7..3ddf739a6f9b 100644 > --- a/drivers/gpu/drm/msm/Kconfig > +++ b/drivers/gpu/drm/msm/Kconfig > @@ -17,7 +17,7 @@ config DRM_MSM > select DRM_SCHED > select SHMEM > select TMPFS > - select QCOM_SCM if ARCH_QCOM > + select QCOM_SCM > select WANT_DEV_COREDUMP > select SND_SOC_HDMI_CODEC if SND_SOC > select SYNC_FILE > @@ -55,7 +55,7 @@ config DRM_MSM_GPU_SUDO > > config DRM_MSM_HDMI_HDCP > bool "Enable HDMI HDCP support in MSM DRM driver" > - depends on DRM_MSM && QCOM_SCM > + depends on DRM_MSM > default y > help > Choose this option to enable HDCP state machine > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 124c41adeca1..989c83acbfee 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -308,7 +308,7 @@ config APPLE_DART > config ARM_SMMU > tristate "ARM Ltd. System MMU (SMMU) Support" > depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64) > - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y > + select QCOM_SCM > select IOMMU_API > select IOMMU_IO_PGTABLE_LPAE > select ARM_DMA_USE_IOMMU if ARM > diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig > index 157c924686e4..80321e03809a 100644 > --- a/drivers/media/platform/Kconfig > +++ b/drivers/media/platform/Kconfig > @@ -565,7 +565,7 @@ config VIDEO_QCOM_VENUS > depends on VIDEO_DEV && VIDEO_V4L2 && QCOM_SMEM > depends on (ARCH_QCOM && IOMMU_DMA) || COMPILE_TEST > select QCOM_MDT_LOADER if ARCH_QCOM > - select QCOM_SCM if ARCH_QCOM > + select QCOM_SCM >
Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote: > Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg() > implementations. It takes an scatterlist segment that must point to a > pci_p2pdma struct page and will map it if the mapping requires a bus > address. > > The return value indicates whether the mapping required a bus address > or whether the caller still needs to map the segment normally. If the > segment should not be mapped, -EREMOTEIO is returned. > > This helper uses a state structure to track the changes to the > pgmap across calls and avoid needing to lookup into the xarray for > every page. > > Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU > dma_map_sg() implementations where the sg segment containing the page > differs from the sg segment containing the DMA address. > > Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Ditto. > --- > drivers/pci/p2pdma.c | 59 ++ > include/linux/pci-p2pdma.h | 21 ++ > 2 files changed, 80 insertions(+) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index b656d8c801a7..58c34f1f1473 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -943,6 +943,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, > struct scatterlist *sg, > } > EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs); > > +/** > + * pci_p2pdma_map_segment - map an sg segment determining the mapping type > + * @state: State structure that should be declared outside of the > for_each_sg() > + * loop and initialized to zero. > + * @dev: DMA device that's doing the mapping operation > + * @sg: scatterlist segment to map > + * > + * This is a helper to be used by non-iommu dma_map_sg() implementations > where > + * the sg segment is the same for the page_link and the dma_address. s/non-iommu/non-IOMMU/ > + * > + * Attempt to map a single segment in an SGL with the PCI bus address. > + * The segment must point to a PCI P2PDMA page and thus must be > + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check. > + * > + * Returns the type of mapping used and maps the page if the type is > + * PCI_P2PDMA_MAP_BUS_ADDR. > + */ > +enum pci_p2pdma_map_type > +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > +struct scatterlist *sg) > +{ > + if (state->pgmap != sg_page(sg)->pgmap) { > + state->pgmap = sg_page(sg)->pgmap; > + state->map = pci_p2pdma_map_type(state->pgmap, dev); > + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; > + } > + > + if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) { > + sg->dma_address = sg_phys(sg) + state->bus_off; > + sg_dma_len(sg) = sg->length; > + sg_dma_mark_pci_p2pdma(sg); > + } > + > + return state->map; > +} > + > +/** > + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to > + * be mapped with PCI_P2PDMA_MAP_BUS_ADDR > + * @pg_sg: scatterlist segment with the page to map > + * @dma_sg: scatterlist segment to assign a dma address to s/dma address/DMA address/, also below > + * > + * This is a helper for iommu dma_map_sg() implementations when the > + * segment for the dma address differs from the segment containing the > + * source page. > + * > + * pci_p2pdma_map_type() must have already been called on the pg_sg and > + * returned PCI_P2PDMA_MAP_BUS_ADDR. > + */ > +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg, > + struct scatterlist *dma_sg) > +{ > + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap); > + > + dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset; > + sg_dma_len(dma_sg) = pg_sg->length; > + sg_dma_mark_pci_p2pdma(dma_sg); > +} > + > /** > * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store > * to enable p2pdma > diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h > index caac2d023f8f..e5a8d5bc0f51 100644 > --- a/include/linux/pci-p2pdma.h > +++ b/include/linux/pci-p2pdma.h > @@ -13,6 +13,12 @@ > > #include > > +struct pci_p2pdma_map_state { > + struct dev_pagemap *pgmap; > + int map; > + u64 bus_off; > +}; > + > struct block_device; > struct scatterlist; > > @@ -70,6 +76,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct > scatterlist *sg, > int nents, enum dma_data_direction dir, unsigned long attrs); > void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, > int nents, enum dma_data_direction dir, unsigned long attrs); > +enum pci_p2pdma_map_type > +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > +struct scatterlist *sg); > +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg, > + struct scatterlist *dma_sg); > int pci_p2pdma_enable_store(const
Re: [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg()
On Thu, Sep 16, 2021 at 05:40:53PM -0600, Logan Gunthorpe wrote: > This interface is superseded by support in dma_map_sg() which now supports > heterogeneous scatterlists. There are no longer any users, so remove it. > > Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Ditto. > --- > drivers/pci/p2pdma.c | 65 -- > include/linux/pci-p2pdma.h | 27 > 2 files changed, 92 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index 58c34f1f1473..4478633346bd 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -878,71 +878,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct > dev_pagemap *pgmap, > return type; > } > > -static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap, > - struct device *dev, struct scatterlist *sg, int nents) > -{ > - struct scatterlist *s; > - int i; > - > - for_each_sg(sg, s, nents, i) { > - s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset; > - sg_dma_len(s) = s->length; > - } > - > - return nents; > -} > - > -/** > - * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA > - * @dev: device doing the DMA request > - * @sg: scatter list to map > - * @nents: elements in the scatterlist > - * @dir: DMA direction > - * @attrs: DMA attributes passed to dma_map_sg() (if called) > - * > - * Scatterlists mapped with this function should be unmapped using > - * pci_p2pdma_unmap_sg_attrs(). > - * > - * Returns the number of SG entries mapped or 0 on error. > - */ > -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, > - int nents, enum dma_data_direction dir, unsigned long attrs) > -{ > - struct pci_p2pdma_pagemap *p2p_pgmap = > - to_p2p_pgmap(sg_page(sg)->pgmap); > - > - switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) { > - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: > - return dma_map_sg_attrs(dev, sg, nents, dir, attrs); > - case PCI_P2PDMA_MAP_BUS_ADDR: > - return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents); > - default: > - return 0; > - } > -} > -EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs); > - > -/** > - * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was > - * mapped with pci_p2pdma_map_sg() > - * @dev: device doing the DMA request > - * @sg: scatter list to map > - * @nents: number of elements returned by pci_p2pdma_map_sg() > - * @dir: DMA direction > - * @attrs: DMA attributes passed to dma_unmap_sg() (if called) > - */ > -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, > - int nents, enum dma_data_direction dir, unsigned long attrs) > -{ > - enum pci_p2pdma_map_type map_type; > - > - map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev); > - > - if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) > - dma_unmap_sg_attrs(dev, sg, nents, dir, attrs); > -} > -EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs); > - > /** > * pci_p2pdma_map_segment - map an sg segment determining the mapping type > * @state: State structure that should be declared outside of the > for_each_sg() > diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h > index e5a8d5bc0f51..0c33a40a86e7 100644 > --- a/include/linux/pci-p2pdma.h > +++ b/include/linux/pci-p2pdma.h > @@ -72,10 +72,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct > scatterlist *sgl); > void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); > enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, >struct device *dev); > -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, > - int nents, enum dma_data_direction dir, unsigned long attrs); > -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, > - int nents, enum dma_data_direction dir, unsigned long attrs); > enum pci_p2pdma_map_type > pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > struct scatterlist *sg); > @@ -135,17 +131,6 @@ pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct > device *dev) > { > return PCI_P2PDMA_MAP_NOT_SUPPORTED; > } > -static inline int pci_p2pdma_map_sg_attrs(struct device *dev, > - struct scatterlist *sg, int nents, enum dma_data_direction dir, > - unsigned long attrs) > -{ > - return 0; > -} > -static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev, > - struct scatterlist *sg, int nents, enum dma_data_direction dir, > - unsigned long attrs) > -{ > -} > static inline enum pci_p2pdma_map_type > pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device > *dev, > struct scatterlist *sg) > @@ -181,16 +166,4 @@ static inline struct pci_dev
Re: [PATCH v3 02/20] PCI/P2PDMA: attempt to set map_type if it has not been set
On Thu, Sep 16, 2021 at 05:40:42PM -0600, Logan Gunthorpe wrote: > Attempt to find the mapping type for P2PDMA pages on the first > DMA map attempt if it has not been done ahead of time. > > Previously, the mapping type was expected to be calculated ahead of > time, but if pages are to come from userspace then there's no > way to ensure the path was checked ahead of time. > > With this change it's no longer invalid to call pci_p2pdma_map_sg() > before the mapping type is calculated so drop the WARN_ON when that > is the case. > > Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Capitalize subject line. > --- > drivers/pci/p2pdma.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index 50cdde3e9a8b..1192c465ba6d 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -848,6 +848,7 @@ static enum pci_p2pdma_map_type > pci_p2pdma_map_type(struct dev_pagemap *pgmap, > struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; > struct pci_dev *client; > struct pci_p2pdma *p2pdma; > + int dist; > > if (!provider->p2pdma) > return PCI_P2PDMA_MAP_NOT_SUPPORTED; > @@ -864,6 +865,10 @@ static enum pci_p2pdma_map_type > pci_p2pdma_map_type(struct dev_pagemap *pgmap, > type = xa_to_value(xa_load(>map_types, > map_types_idx(client))); > rcu_read_unlock(); > + > + if (type == PCI_P2PDMA_MAP_UNKNOWN) > + return calc_map_type_and_dist(provider, client, , false); > + > return type; > } > > @@ -906,7 +911,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct > scatterlist *sg, > case PCI_P2PDMA_MAP_BUS_ADDR: > return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents); > default: > - WARN_ON_ONCE(1); > return 0; > } > } > -- > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote: > Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap > a hunk of p2pmem into userspace. > > Pages are allocated from the genalloc in bulk and their reference count > incremented. They are returned to the genalloc when the page is put. > > The VMA does not take a reference to the pages when they are inserted > with vmf_insert_mixed() (which is necessary for zone device pages) so > the backing P2P memory is stored in a structures in vm_private_data. > > A pseudo mount is used to allocate an inode for each PCI device. The > inode's address_space is used in the file doing the mmap so that all > VMAs are collected and can be unmapped if the PCI device is unbound. > After unmapping, the VMAs are iterated through and their pages are > put so the device can continue to be unbound. An active flag is used > to signal to VMAs not to allocate any further P2P memory once the > removal process starts. The flag is synchronized with concurrent > access with an RCU lock. > > The VMAs and inode will survive after the unbind of the device, but no > pages will be present in the VMA and a subsequent access will result > in a SIGBUS error. > > Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas I would capitalize "Introduce" in the subject line. > --- > drivers/pci/p2pdma.c | 263 - > include/linux/pci-p2pdma.h | 11 ++ > include/uapi/linux/magic.h | 1 + > 3 files changed, 273 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index 2422af5a529c..a5adf57af53a 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -16,14 +16,19 @@ > #include > #include > #include > +#include > +#include > #include > #include > #include > +#include > > struct pci_p2pdma { > struct gen_pool *pool; > bool p2pmem_published; > struct xarray map_types; > + struct inode *inode; > + bool active; > }; > > struct pci_p2pdma_pagemap { > @@ -32,6 +37,14 @@ struct pci_p2pdma_pagemap { > u64 bus_offset; > }; > > +struct pci_p2pdma_map { > + struct kref ref; > + struct pci_dev *pdev; > + struct inode *inode; > + void *kaddr; > + size_t len; > +}; > + > static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) > { > return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap); > @@ -100,6 +113,26 @@ static const struct attribute_group p2pmem_group = { > .name = "p2pmem", > }; > > +/* > + * P2PDMA internal mount > + * Fake an internal VFS mount-point in order to allocate struct address_space > + * mappings to remove VMAs on unbind events. > + */ > +static int pci_p2pdma_fs_cnt; > +static struct vfsmount *pci_p2pdma_fs_mnt; > + > +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc) > +{ > + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM; > +} > + > +static struct file_system_type pci_p2pdma_fs_type = { > + .name = "p2dma", > + .owner = THIS_MODULE, > + .init_fs_context = pci_p2pdma_fs_init_fs_context, > + .kill_sb = kill_anon_super, > +}; > + > static void p2pdma_page_free(struct page *page) > { > struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap); > @@ -128,6 +161,9 @@ static void pci_p2pdma_release(void *data) > gen_pool_destroy(p2pdma->pool); > sysfs_remove_group(>dev.kobj, _group); > xa_destroy(>map_types); > + > + iput(p2pdma->inode); > + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt); > } > > static int pci_p2pdma_setup(struct pci_dev *pdev) > @@ -145,17 +181,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) > if (!p2p->pool) > goto out; > > - error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev); > + error = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt, > + _p2pdma_fs_cnt); > if (error) > goto out_pool_destroy; > > + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb); > + if (IS_ERR(p2p->inode)) { > + error = -ENOMEM; > + goto out_unpin_fs; > + } > + > + error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev); > + if (error) > + goto out_put_inode; > + > error = sysfs_create_group(>dev.kobj, _group); > if (error) > - goto out_pool_destroy; > + goto out_put_inode; > > rcu_assign_pointer(pdev->p2pdma, p2p); > return 0; > > +out_put_inode: > + iput(p2p->inode); > +out_unpin_fs: > + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt); > out_pool_destroy: > gen_pool_destroy(p2p->pool); > out: > @@ -163,6 +214,45 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) > return error; > } > > +static void pci_p2pdma_map_free_pages(struct pci_p2pdma_map *pmap) > +{ > + int i; > + > + if (!pmap->kaddr) > + return; > + > +
Re: [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static
On Thu, Sep 16, 2021 at 05:40:43PM -0600, Logan Gunthorpe wrote: > pci_p2pdma_map_type() will be needed by the dma-iommu map_sg > implementation because it will need to determine the mapping type > ahead of actually doing the mapping to create the actual iommu mapping. I don't expect this to go via the PCI tree, but if it did I would silently: s/PCI/P2PDMA: make pci_p2pdma_map_type() non-static/ PCI/P2PDMA: Expose pci_p2pdma_map_type()/ s/iommu/IOMMU/ and mention what this patch does in the commit log (in addition to the subject) and fix a couple minor typos below. > Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas > --- > drivers/pci/p2pdma.c | 24 +- > include/linux/pci-p2pdma.h | 41 ++ > 2 files changed, 56 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c > index 1192c465ba6d..b656d8c801a7 100644 > --- a/drivers/pci/p2pdma.c > +++ b/drivers/pci/p2pdma.c > @@ -20,13 +20,6 @@ > #include > #include > > -enum pci_p2pdma_map_type { > - PCI_P2PDMA_MAP_UNKNOWN = 0, > - PCI_P2PDMA_MAP_NOT_SUPPORTED, > - PCI_P2PDMA_MAP_BUS_ADDR, > - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, > -}; > - > struct pci_p2pdma { > struct gen_pool *pool; > bool p2pmem_published; > @@ -841,8 +834,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool > publish) > } > EXPORT_SYMBOL_GPL(pci_p2pmem_publish); > > -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap > *pgmap, > - struct device *dev) > +/** > + * pci_p2pdma_map_type - return the type of mapping that should be used for > + * a given device and pgmap > + * @pgmap: the pagemap of a page to determine the mapping type for > + * @dev: device that is mapping the page > + * > + * Returns one of: > + * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done > + * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address > + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally > + * using the CPU physical address (in dma-direct) or an IOVA > + * mapping for the IOMMU. > + */ > +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, > + struct device *dev) > { > enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED; > struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; > diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h > index 8318a97c9c61..caac2d023f8f 100644 > --- a/include/linux/pci-p2pdma.h > +++ b/include/linux/pci-p2pdma.h > @@ -16,6 +16,40 @@ > struct block_device; > struct scatterlist; > > +enum pci_p2pdma_map_type { > + /* > + * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping > + * type hasn't been calculated yet. Functions that return this enum > + * never return this value. > + */ > + PCI_P2PDMA_MAP_UNKNOWN = 0, > + > + /* > + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will > + * traverse the host bridge and the host bridge is not in the > + * whitelist. DMA Mapping routines should return an error when > + * this is returned. > + */ > + PCI_P2PDMA_MAP_NOT_SUPPORTED, > + > + /* > + * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to > + * eachother directly through a PCI switch and the transaction will > + * not traverse the host bridge. Such a mapping should program > + * the DMA engine with PCI bus addresses. s/eachother/each other/ > + */ > + PCI_P2PDMA_MAP_BUS_ADDR, > + > + /* > + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk > + * to eachother, but the transaction traverses a host bridge on the > + * whitelist. In this case, a normal mapping either with CPU physical > + * addresses (in the case of dma-direct) or IOVA addresses (in the > + * case of IOMMUs) should be used to program the DMA engine. s/eachother/each other/ > + */ > + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, > +}; > + > #ifdef CONFIG_PCI_P2PDMA > int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, > u64 offset); > @@ -30,6 +64,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev > *pdev, >unsigned int *nents, u32 length); > void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl); > void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); > +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, > + struct device *dev); > int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, > int nents, enum dma_data_direction dir, unsigned long attrs); > void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, > @@ -83,6 +119,11
Re: [PATCH v2 6/9] PCI: Add pci_find_dvsec_capability to find designated VSEC
s/pci_find_dvsec_capability/pci_find_dvsec_capability()/ in subject and commit log. On Thu, Sep 23, 2021 at 10:26:44AM -0700, Ben Widawsky wrote: > Add pci_find_dvsec_capability to locate a Designated Vendor-Specific > Extended Capability with the specified DVSEC ID. "specified Vendor ID and Capability ID". > The Designated Vendor-Specific Extended Capability (DVSEC) allows one or > more vendor specific capabilities that aren't tied to the vendor ID of > the PCI component. > > DVSEC is critical for both the Compute Express Link (CXL) driver as well > as the driver for OpenCAPI coherent accelerator (OCXL). Strictly speaking, not really relevant for the commit log. > Cc: David E. Box > Cc: Jonathan Cameron > Cc: Bjorn Helgaas > Cc: Dan Williams > Cc: linux-...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org > Cc: Andrew Donnellan > Cc: Lu Baolu > Reviewed-by: Frederic Barrat > Signed-off-by: Ben Widawsky If you want to merge this with the series, Acked-by: Bjorn Helgaas Or if you want me to merge this on a branch, let me know. > --- > drivers/pci/pci.c | 32 > include/linux/pci.h | 1 + > 2 files changed, 33 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index ce2ab62b64cf..94ac86ff28b0 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -732,6 +732,38 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 > vendor, int cap) > } > EXPORT_SYMBOL_GPL(pci_find_vsec_capability); > > +/** > + * pci_find_dvsec_capability - Find DVSEC for vendor > + * @dev: PCI device to query > + * @vendor: Vendor ID to match for the DVSEC > + * @dvsec: Designated Vendor-specific capability ID > + * > + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability > + * offset in config space; otherwise return 0. > + */ > +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec) > +{ > + int pos; > + > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC); > + if (!pos) > + return 0; > + > + while (pos) { > + u16 v, id; > + > + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, ); > + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, ); > + if (vendor == v && dvsec == id) > + return pos; > + > + pos = pci_find_next_ext_capability(dev, pos, > PCI_EXT_CAP_ID_DVSEC); > + } > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(pci_find_dvsec_capability); > + > /** > * pci_find_parent_resource - return resource region of parent bus of given > * region > diff --git a/include/linux/pci.h b/include/linux/pci.h > index cd8aa6fce204..c93ccfa4571b 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1130,6 +1130,7 @@ u16 pci_find_ext_capability(struct pci_dev *dev, int > cap); > u16 pci_find_next_ext_capability(struct pci_dev *dev, u16 pos, int cap); > struct pci_bus *pci_find_next_bus(const struct pci_bus *from); > u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap); > +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec); > > u64 pci_get_dsn(struct pci_dev *dev); > > -- > 2.33.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] [RFC] qcom_scm: hide Kconfig symbol
From: Arnd Bergmann Now that SCM can be a loadable module, we have to add another dependency to avoid link failures when ipa or adreno-gpu are built-in: aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe': ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available' ld.lld: error: undefined symbol: qcom_scm_is_available >>> referenced by adreno_gpu.c >>> gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in >>> archive drivers/built-in.a This can happen when CONFIG_ARCH_QCOM is disabled and we don't select QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd use a similar dependency here to what we have for QCOM_RPROC_COMMON, but that causes dependency loops from other things selecting QCOM_SCM. This appears to be an endless problem, so try something different this time: - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on' but that is simply selected by all of its users - All the stubs in include/linux/qcom_scm.h can go away - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to allow compile-testing QCOM_SCM on all architectures. - To avoid a circular dependency chain involving RESET_CONTROLLER and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in the latter one to 'select'. The last bit is rather annoying, as drivers should generally never 'select' another subsystem, and about half the users of the reset controller interface do this anyway. Nevertheless, this version seems to pass all my randconfig tests and is more robust than any of the prior versions. Comments? Signed-off-by: Arnd Bergmann --- drivers/firmware/Kconfig| 4 +- drivers/gpu/drm/msm/Kconfig | 4 +- drivers/iommu/Kconfig | 2 +- drivers/media/platform/Kconfig | 2 +- drivers/mmc/host/Kconfig| 2 +- drivers/net/ipa/Kconfig | 1 + drivers/net/wireless/ath/ath10k/Kconfig | 2 +- drivers/pinctrl/qcom/Kconfig| 3 +- drivers/pinctrl/sunxi/Kconfig | 6 +-- include/linux/arm-smccc.h | 10 include/linux/qcom_scm.h| 71 - 11 files changed, 23 insertions(+), 84 deletions(-) diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index 220a58cf0a44..f7dd82ef0b9c 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -203,9 +203,7 @@ config INTEL_STRATIX10_RSU Say Y here if you want Intel RSU support. config QCOM_SCM - tristate "Qcom SCM driver" - depends on ARM || ARM64 - depends on HAVE_ARM_SMCCC + tristate select RESET_CONTROLLER config QCOM_SCM_DOWNLOAD_MODE_DEFAULT diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig index e9c6af78b1d7..3ddf739a6f9b 100644 --- a/drivers/gpu/drm/msm/Kconfig +++ b/drivers/gpu/drm/msm/Kconfig @@ -17,7 +17,7 @@ config DRM_MSM select DRM_SCHED select SHMEM select TMPFS - select QCOM_SCM if ARCH_QCOM + select QCOM_SCM select WANT_DEV_COREDUMP select SND_SOC_HDMI_CODEC if SND_SOC select SYNC_FILE @@ -55,7 +55,7 @@ config DRM_MSM_GPU_SUDO config DRM_MSM_HDMI_HDCP bool "Enable HDMI HDCP support in MSM DRM driver" - depends on DRM_MSM && QCOM_SCM + depends on DRM_MSM default y help Choose this option to enable HDCP state machine diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 124c41adeca1..989c83acbfee 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -308,7 +308,7 @@ config APPLE_DART config ARM_SMMU tristate "ARM Ltd. System MMU (SMMU) Support" depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64) - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y + select QCOM_SCM select IOMMU_API select IOMMU_IO_PGTABLE_LPAE select ARM_DMA_USE_IOMMU if ARM diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig index 157c924686e4..80321e03809a 100644 --- a/drivers/media/platform/Kconfig +++ b/drivers/media/platform/Kconfig @@ -565,7 +565,7 @@ config VIDEO_QCOM_VENUS depends on VIDEO_DEV && VIDEO_V4L2 && QCOM_SMEM depends on (ARCH_QCOM && IOMMU_DMA) || COMPILE_TEST select QCOM_MDT_LOADER if ARCH_QCOM - select QCOM_SCM if ARCH_QCOM + select QCOM_SCM select VIDEOBUF2_DMA_CONTIG select V4L2_MEM2MEM_DEV help diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index 71313961cc54..95b3511b0560 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -547,7 +547,7 @@ config MMC_SDHCI_MSM depends on MMC_SDHCI_PLTFM select MMC_SDHCI_IO_ACCESSORS select MMC_CQHCI - select QCOM_SCM if MMC_CRYPTO && ARCH_QCOM + select QCOM_SCM if MMC_CRYPTO help This selects the Secure Digital
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote: > +static int iommu_dev_viable(struct device *dev, void *data) > +{ > + enum dma_hint hint = *data; > + struct device_driver *drv = READ_ONCE(dev->driver); > + > + /* no conflict if the new device doesn't do DMA */ > + if (hint == DMA_FOR_NONE) > + return 0; > + > + /* no conflict if this device is driver-less, or doesn't do DMA */ > + if (!drv || (drv->dma_hint == DMA_FOR_NONE)) > + return 0; While it is kind of clever to fetch this in the drv like this, the locking just doesn't work right. The group itself needs to have an atomic that encodes what state it is in. You can read the initial state from the drv, under the device_lock, and update the atomic state Also, don't call it "hint", there is nothing hinty about this, it has definitive functional impacts. Greg will want to see a definiate benefit from this extra global code, so be sure to explain about why the BUG_ON is bad, and how driver core involvement is needed to fix it properly. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver
Hi Christoph: Gentile ping. The swiotlb and shared memory mapping changes in this patchset needs your reivew. Could you have a look? Thanks. On 9/22/2021 6:34 PM, Tianyu Lan wrote: Hi Christoph: This patch follows your purposal in the previous discussion. Could you have a look? "use vmap_pfn as in the current series. But in that case I think we should get rid of the other mapping created by vmalloc. I though a bit about finding a way to apply the offset in vmalloc itself, but I think it would be too invasive to the normal fast path. So the other sub-option would be to allocate the pages manually (maybe even using high order allocations to reduce TLB pressure) and then remap them(https://lkml.org/lkml/2021/9/2/112) Otherwise, I merge your previous change for swiotlb into patch 9 “x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM” You previous change link.(http://git.infradead.org/users/hch/misc.git/commit/8248f295928aded3364a1e54a4e0022e93d3610c) Please have a look. Thanks. On 9/16/2021 12:21 AM, Michael Kelley wrote: From: Tianyu Lan Sent: Tuesday, September 14, 2021 6:39 AM In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. Allocate rx/tx ring buffer via alloc_pages() in Isolation VM and map these pages via vmap(). After calling vmbus_establish_gpadl() which marks these pages visible to host, unmap these pages to release the virtual address mapped with physical address below shared_gpa_boundary and map them in the extra address space via vmap_pfn(). Signed-off-by: Tianyu Lan --- Change since v4: * Allocate rx/tx ring buffer via alloc_pages() in Isolation VM * Map pages after calling vmbus_establish_gpadl(). * set dma_set_min_align_mask for netvsc driver. Change since v3: * Add comment to explain why not to use dma_map_sg() * Fix some error handle. --- drivers/net/hyperv/hyperv_net.h | 7 + drivers/net/hyperv/netvsc.c | 287 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/linux/hyperv.h | 5 + 5 files changed, 296 insertions(+), 6 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..87e8c74398a5 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,8 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + struct page **recv_pages; + u32 recv_page_count; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1085,8 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + struct page **send_pages; + u32 send_page_count; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1736,6 @@ struct rndis_message { #define RETRY_US_HI 1 #define RETRY_MAX 2000 /* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 1f87e570ed2b..7d5254bf043e 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -150,11 +151,33 @@ static void free_netvsc_device(struct rcu_head *head) { struct netvsc_device *nvdev = container_of(head, struct netvsc_device, rcu); + unsigned int alloc_unit; int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_pages) { + alloc_unit = (nvdev->recv_buf_size / + nvdev->recv_page_count) >> PAGE_SHIFT; + + vunmap(nvdev->recv_buf); + for (i = 0; i < nvdev->recv_page_count; i++) + __free_pages(nvdev->recv_pages[i], alloc_unit); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_pages) { + alloc_unit = (nvdev->send_buf_size / + nvdev->send_page_count) >> PAGE_SHIFT; + + vunmap(nvdev->send_buf); + for (i = 0; i < nvdev->send_page_count; i++) + __free_pages(nvdev->send_pages[i],
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Mon, Sep 27, 2021 at 01:00:08PM +, Tian, Kevin wrote: > > I think for such a narrow usage you should not change the struct > > device_driver. Just have pci_stub call a function to flip back to user > > mode. > > Here we want to ensure that kernel dma should be blocked > if the group is already marked for user-dma. If we just blindly > do it for any driver at this point (as you commented earlier): > > + ret = iommu_set_kernel_ownership(dev); > + if (ret) > + return ret; > > how would pci-stub reach its function to indicate that it doesn't > do dma and flip back? > Do you envision a simpler policy that no driver can be bound > to the group if it's already set for user-dma? what about vfio-pci > itself? Yes.. I'm not sure there is a good use case to allow the stub drivers to load/unload while a VFIO is running. At least, not a strong enough one to justify a global change to the driver core.. > > > +static int iommu_dev_viable(struct device *dev, void *data) > > > +{ > > > + enum dma_hint hint = *data; > > > + struct device_driver *drv = READ_ONCE(dev->driver); > > > > Especially since this isn't locked properly or safe. > > I have the same worry when copying from vfio. Not sure how > vfio gets safe with this approach... Fixing the locking in vfio_dev_viable is part of deleting the unbound list. Once it properly uses the device_lock and doesn't race with the driver core like this things are much better. Don't copy this stuff into the iommu core without fixing it. https://github.com/jgunthorpe/linux/commit/fa6abb318ccca114da12c0b5b123c99131ace926 https://github.com/jgunthorpe/linux/commit/45980bd90b023d1eea56df70d1c395bdf4cc7cf1 I can't remember if the above is contingent on some of the mdev cleanups or not.. Have to get back to it. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Lu Baolu > Sent: Monday, September 27, 2021 7:34 PM > > On 2021/9/27 17:42, Tian, Kevin wrote: > > +int iommu_device_set_dma_hint(struct device *dev, enum dma_hint hint) > > +{ > > + struct iommu_group *group; > > + int ret; > > + > > + group = iommu_group_get(dev); > > + /* not an iommu-probed device */ > > + if (!group) > > + return 0; > > + > > + mutex_lock(>mutex); > > + ret = __iommu_group_viable(group, hint); > > + mutex_unlock(>mutex); > > + > > + iommu_group_put(group); > > + return ret; > > +} > > Conceptually, we could also move iommu_deferred_attach() from > iommu_dma_ops here to save unnecessary checks in the hot DMA API > paths? > Yes, it's possible. But just be curious, why doesn't iommu core manage deferred_attach when receiving BOUND_DRIVER event? Is there other implication that deferred attach cannot be done at driver binding time? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Jason Gunthorpe > Sent: Monday, September 27, 2021 7:54 PM > > On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 22, 2021 8:40 PM > > > > > > > > Ie the basic flow would see the driver core doing some: > > > > > > > > Just double confirm. Is there concern on having the driver core to > > > > call iommu functions? > > > > > > It is always an interesting question, but I'd say iommu is > > > foundantional to Linux and if it needs driver core help it shouldn't > > > be any different from PM, pinctl, or other subsystems that have > > > inserted themselves into the driver core. > > > > > > Something kind of like the below. > > > > > > If I recall, once it is done like this then the entire iommu notifier > > > infrastructure can be ripped out which is a lot of code. > > > > Currently vfio is the only user of this notifier mechanism. Now > > three events are handled in vfio_iommu_group_notifier(): > > > > NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose > > not required once we handle it cleanly in the iommu/driver core. > > > > NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change. > > > > NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on > > the comments the group->unbound_list is used to avoid breaking > > I have a patch series to delete the unbound_list, the scenario you > describe is handled by the device_lock() that's great! > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > index 68ea1f9..826a651 100644 > > +++ b/drivers/base/dd.c > > @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct > device_driver *drv) > > goto done; > > } > > > > + ret = iommu_device_set_dma_hint(dev, drv->dma_hint); > > + if (ret) > > + return ret; > > I think for such a narrow usage you should not change the struct > device_driver. Just have pci_stub call a function to flip back to user > mode. Here we want to ensure that kernel dma should be blocked if the group is already marked for user-dma. If we just blindly do it for any driver at this point (as you commented earlier): + ret = iommu_set_kernel_ownership(dev); + if (ret) + return ret; how would pci-stub reach its function to indicate that it doesn't do dma and flip back? Do you envision a simpler policy that no driver can be bound to the group if it's already set for user-dma? what about vfio-pci itself? > > > +static int iommu_dev_viable(struct device *dev, void *data) > > +{ > > + enum dma_hint hint = *data; > > + struct device_driver *drv = READ_ONCE(dev->driver); > > Especially since this isn't locked properly or safe. I have the same worry when copying from vfio. Not sure how vfio gets safe with this approach... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/mediatek: fix out-of-range warning with clang
From: Arnd Bergmann clang-14 notices that a comparison is never true when CONFIG_PHYS_ADDR_T_64BIT is disabled: drivers/iommu/mtk_iommu.c:553:34: error: result of comparison of constant 5368709120 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (dom->data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE) ~~ ^ ~ Add an explicit check for the type of the variable to skip the check and the warning in that case. Fixes: b4dad40e4f35 ("iommu/mediatek: Adjust the PA for the 4GB Mode") Signed-off-by: Arnd Bergmann --- drivers/iommu/mtk_iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index d837adfd1da5..25b834104790 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -550,7 +550,9 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain, phys_addr_t pa; pa = dom->iop->iova_to_phys(dom->iop, iova); - if (dom->data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE) + if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT) && + dom->data->enable_4GB && + pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE) pa &= ~BIT_ULL(32); return pa; -- 2.29.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 8:40 PM > > > > > > Ie the basic flow would see the driver core doing some: > > > > > > Just double confirm. Is there concern on having the driver core to > > > call iommu functions? > > > > It is always an interesting question, but I'd say iommu is > > foundantional to Linux and if it needs driver core help it shouldn't > > be any different from PM, pinctl, or other subsystems that have > > inserted themselves into the driver core. > > > > Something kind of like the below. > > > > If I recall, once it is done like this then the entire iommu notifier > > infrastructure can be ripped out which is a lot of code. > > Currently vfio is the only user of this notifier mechanism. Now > three events are handled in vfio_iommu_group_notifier(): > > NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose > not required once we handle it cleanly in the iommu/driver core. > > NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change. > > NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on > the comments the group->unbound_list is used to avoid breaking I have a patch series to delete the unbound_list, the scenario you describe is handled by the device_lock() > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 68ea1f9..826a651 100644 > +++ b/drivers/base/dd.c > @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct > device_driver *drv) > goto done; > } > > + ret = iommu_device_set_dma_hint(dev, drv->dma_hint); > + if (ret) > + return ret; I think for such a narrow usage you should not change the struct device_driver. Just have pci_stub call a function to flip back to user mode. > +static int iommu_dev_viable(struct device *dev, void *data) > +{ > + enum dma_hint hint = *data; > + struct device_driver *drv = READ_ONCE(dev->driver); Especially since this isn't locked properly or safe. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
On 2021/9/27 17:42, Tian, Kevin wrote: +int iommu_device_set_dma_hint(struct device *dev, enum dma_hint hint) +{ + struct iommu_group *group; + int ret; + + group = iommu_group_get(dev); + /* not an iommu-probed device */ + if (!group) + return 0; + + mutex_lock(>mutex); + ret = __iommu_group_viable(group, hint); + mutex_unlock(>mutex); + + iommu_group_put(group); + return ret; +} Conceptually, we could also move iommu_deferred_attach() from iommu_dma_ops here to save unnecessary checks in the hot DMA API paths? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:40 PM > > > > Ie the basic flow would see the driver core doing some: > > > > Just double confirm. Is there concern on having the driver core to > > call iommu functions? > > It is always an interesting question, but I'd say iommu is > foundantional to Linux and if it needs driver core help it shouldn't > be any different from PM, pinctl, or other subsystems that have > inserted themselves into the driver core. > > Something kind of like the below. > > If I recall, once it is done like this then the entire iommu notifier > infrastructure can be ripped out which is a lot of code. Currently vfio is the only user of this notifier mechanism. Now three events are handled in vfio_iommu_group_notifier(): NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose not required once we handle it cleanly in the iommu/driver core. NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change. NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on the comments the group->unbound_list is used to avoid breaking group viability check between vfio_unregister_group_dev() and final dev/drv teardown. within that small window the device is not tracked by vfio group but is still bound to a driver (e.g. vfio-pci itself), while an external group user may hold a reference to the group. Possibly it's not required now with the new mechanism as we rely on init/exit_user_dma() as the single switch to claim/ withdraw the group ownership. As long as exit_user_dma() is not called until vfio_group_release(), above small window is covered thus no need to maintain a unbound_list. But anyway since this corner case is tricky, will think more in case of any oversight. > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 68ea1f949daa90..e39612c99c6123 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct > device_driver *drv) > goto done; > } > > + ret = iommu_set_kernel_ownership(dev); > + if (ret) > + return ret; > + > re_probe: > dev->driver = drv; > > @@ -673,6 +677,7 @@ static int really_probe(struct device *dev, struct > device_driver *drv) > dev->pm_domain->dismiss(dev); > pm_runtime_reinit(dev); > dev_pm_set_driver_flags(dev, 0); > + iommu_release_kernel_ownership(dev); > done: > return ret; > } > @@ -1214,6 +1219,7 @@ static void __device_release_driver(struct device > *dev, struct device *parent) > dev->pm_domain->dismiss(dev); > pm_runtime_reinit(dev); > dev_pm_set_driver_flags(dev, 0); > + iommu_release_kernel_ownership(dev); > > klist_remove(>p->knode_driver); > device_pm_check_callbacks(dev); I expanded above into below conceptual draft. Please help check whether it matches your thought: diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 68ea1f9..826a651 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct device_driver *drv) goto done; } + ret = iommu_device_set_dma_hint(dev, drv->dma_hint); + if (ret) + return ret; + re_probe: dev->driver = drv; @@ -673,6 +677,7 @@ static int really_probe(struct device *dev, struct device_driver *drv) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); dev_pm_set_driver_flags(dev, 0); + iommu_device_clear_dma_hint(dev); done: return ret; } @@ -1214,6 +1219,7 @@ static void __device_release_driver(struct device *dev, struct device *parent) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); dev_pm_set_driver_flags(dev, 0); + iommu_device_clear_dma_hint(dev); klist_remove(>p->knode_driver); device_pm_check_callbacks(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3303d70..b12f335 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1064,6 +1064,104 @@ void iommu_group_put(struct iommu_group *group) } EXPORT_SYMBOL_GPL(iommu_group_put); +static int iommu_dev_viable(struct device *dev, void *data) +{ + enum dma_hint hint = *data; + struct device_driver *drv = READ_ONCE(dev->driver); + + /* no conflict if the new device doesn't do DMA */ + if (hint == DMA_FOR_NONE) + return 0; + + /* no conflict if this device is driver-less, or doesn't do DMA */ + if (!drv || (drv->dma_hint == DMA_FOR_NONE)) + return 0; + + /* kernel dma and user dma are exclusive */ + if (hint != drv->dma_hint) + return -EINVAL; + + /* +* devices in the group could be bound to different user-dma