Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks
On 2021-08-02 21:42, Will Deacon wrote: On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote: Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in such cases, we would need to drop the XO clock vote in unprepare call and this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened) clock driver which controls RPMh managed clock resources for new QTI SoCs and is a blocking call. Given we cannot have a sleeping calls such as clk_bulk_prepare() and clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu operations like map and unmap can be in atomic context and are in fast path, add this prepare and unprepare call to drop the XO vote only for system pm callbacks since it is not a fast path and we expect the system to enter deep sleep states with system pm as opposed to runtime pm. This is a similar sequence of clock requests (prepare,enable and disable,unprepare) in arm-smmu probe and remove. Signed-off-by: Sai Prakash Ranjan Co-developed-by: Rajendra Nayak Signed-off-by: Rajendra Nayak --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) [+Rob] How does this work with that funny GPU which writes to the SMMU registers directly? Does the SMMU need to remain independently clocked for that to work or is it all in the same clock domain? As Rob mentioned, device link should take care of all the dependencies between SMMU and its consumers. But not sure how the question relates to this patch as this change is for system pm and not runtime pm, so it is exactly the sequence of SMMU probe/remove which if works currently for that GPU SMMU, then it should work just fine for system suspend and resume as well. diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d3c6f54110a5..9561ba4c5d39 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2277,6 +2277,13 @@ static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev) static int __maybe_unused arm_smmu_pm_resume(struct device *dev) { + int ret; + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + ret = clk_bulk_prepare(smmu->num_clks, smmu->clks); + if (ret) + return ret; + if (pm_runtime_suspended(dev)) return 0; If we subsequently fail to enable the clks in arm_smmu_runtime_resume() should we unprepare them again? If we are unable to turn on the clks then its fatal and we will not live for long. Thanks, Sai Will @@ -2285,10 +2292,19 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev) static int __maybe_unused arm_smmu_pm_suspend(struct device *dev) { + int ret = 0; + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + if (pm_runtime_suspended(dev)) - return 0; + goto clk_unprepare; - return arm_smmu_runtime_suspend(dev); + ret = arm_smmu_runtime_suspend(dev); + if (ret) + return ret; + +clk_unprepare: + clk_bulk_unprepare(smmu->num_clks, smmu->clks); + return ret; } static const struct dev_pm_ops arm_smmu_pm_ops = { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/iova: kmemleak when disable SRIOV.
[AMD Official Use Only] Hi Chris I hit kmemleak with your following patch, Can you help to fix it? According to the info in this thread, it seems the patch doesn't merge into iommu mainline branch, but I can get your patch from my kernel: 5.11.0 commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d Author: Chris Wilson Date: Sat Jan 16 11:10:35 2021 + iommu/iova: Use bottom-up allocation for DMA32 Since DMA32 allocations are currently allocated top-down from the 4G boundary, this causes fragmentation and reduces the maximise allocation size. On some systems, the dma address space may be extremely constrained by PCIe windows, requiring a stronger anti-fragmentation strategy. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2929 Signed-off-by: Chris Wilson -- BW Pengju Zhou > -Original Message- > From: Robin Murphy > Sent: Tuesday, July 27, 2021 10:23 PM > To: Zhou, Peng Ju ; iommu@lists.linux- > foundation.org > Cc: Deucher, Alexander ; Wang, Yin > ; w...@kernel.org > Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV. > > On 2021-07-27 15:05, Zhou, Peng Ju wrote: > > [AMD Official Use Only] > > > > Hi Robin > > The patch which add "head" : > > > > commit 48a64dd561a53fb809e3f2210faf5dd442cfc56d > > Author: Chris Wilson > > Date: Sat Jan 16 11:10:35 2021 + > > > > iommu/iova: Use bottom-up allocation for DMA32 > > > > Since DMA32 allocations are currently allocated top-down from the 4G > > boundary, this causes fragmentation and reduces the maximise allocation > > size. On some systems, the dma address space may be extremely > > constrained by PCIe windows, requiring a stronger anti-fragmentation > > strategy. > > > > Closes: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.f > reedesktop.org%2Fdrm%2Fintel%2F- > %2Fissues%2F2929&data=04%7C01%7CPengJu.Zhou%40amd.com%7C47f > c4308f6044a379ed908d9510a19b1%7C3dd8961fe4884e608e11a82d994e183d > %7C0%7C0%7C637629927137121754%7CUnknown%7CTWFpbGZsb3d8eyJWIjo > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000 > &sdata=iO5%2FKSW8KV1UZtwGU3oiZpYqiR4eBNcSpF3%2Ft6uSDpY%3D& > amp;reserved=0 > > Signed-off-by: Chris Wilson > > ...which is not in mainline. I've never even seen it posted for review. > In fact two search engines can't seem to find any trace of that SHA or patch > subject on the internet at all. > > By all means tell Chris that his patch, wherever you got it from, is broken, > but > once again there's nothing the upstream maintainers/reviewers can do about > code which isn't upstream. > > Thanks, > Robin. > > > -- > > BW > > Pengju Zhou > > > > > > > >> -Original Message- > >> From: Robin Murphy > >> Sent: Tuesday, July 27, 2021 4:52 PM > >> To: Zhou, Peng Ju ; iommu@lists.linux- > >> foundation.org > >> Cc: Deucher, Alexander ; Wang, Yin > >> ; w...@kernel.org > >> Subject: Re: [PATCH] iommu/iova: kmemleak when disable SRIOV. > >> > >> On 2021-07-27 05:46, Zhou, Peng Ju wrote: > >>> [AMD Official Use Only] > >>> > >>> Hi Robin > >>> 1. it is not a good manner to free a statically allocated object(in > >>> this case, it > >> is iovad->head) dynamically even though the free only occurred when > >> shut down the OS in most cases. > >>> 2. the kmemleak occurred when disable SRIOV(remove a PCIE device), I > >>> post the log in the following, in the log, the line:" kmemleak: > >>> Found object by alias at 0x9221ae647050" means the OS frees a > >>> not existing object(iovad->head) which added to RB_TREE in the > >>> function init_iova_domain > >> > >> Sure, it was apparent enough what the bug was; my point is that the > >> bug does not exist in mainline. This is the current mainline > >> definition of struct > >> iova_domain: > >> > >> > >> /* holds all the iova translations for a domain */ struct iova_domain { > >>spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree > >> */ > >>struct rb_root rbroot; /* iova domain rbtree root */ > >>struct rb_node *cached_node; /* Save last alloced node */ > >>struct rb_node *cached32_node; /* Save last 32-bit alloced node */ > >>unsigned long granule;/* pfn granularity for this domain */ > >>unsigned long start_pfn; /* Lower limit for this domain */ > >>unsigned long dma_32bit_pfn; > >>unsigned long max32_alloc_size; /* Size of last failed allocation */ > >>struct iova_fq __percpu *fq;/* Flush Queue */ > >> > >>atomic64_t fq_flush_start_cnt; /* Number of TLB flushes > >> that > >> have been started */ > >> > >>atomic64_t fq_flush_finish_cnt;/* Number of TLB flushes > >> that > >> have been finished */ > >
Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation
On 2021-08-02 21:13, Will Deacon wrote: On Wed, Jun 23, 2021 at 07:12:01PM +0530, Sai Prakash Ranjan wrote: Currently for iommu_unmap() of large scatter-gather list with page size elements, the majority of time is spent in flushing of partial walks in __arm_lpae_unmap() which is a VA based TLB invalidation invalidating page-by-page on iommus like arm-smmu-v2 (TLBIVA). For example: to unmap a 32MB scatter-gather list with page size elements (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge overhead. On qcom implementation, there are several performance improvements for TLB cache invalidations in HW like wait-for-safe (for realtime clients such as camera and display) and few others to allow for cache lookups/updates when TLBI is in progress for the same context bank. So the cost of over-invalidation is less compared to the unmap latency on several usecases like camera which deals with large buffers. So, ASID based TLB invalidations (TLBIASID) can be used to invalidate the entire context for partial walk flush thereby improving the unmap latency. Non-strict mode can use this by default for all platforms given its all about over-invalidation saving time on individual unmaps and non-deterministic generally. For this example of 32MB scatter-gather list unmap, this change results in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 TLBIVAs thereby increasing the performance of unmaps drastically. Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap: (average over 10 iterations) Before this optimization: sizeiommu_map_sg iommu_unmap 4K2.067 us 1.854 us 64K9.598 us 8.802 us 1M 148.890 us 130.718 us 2M 305.864 us67.291 us 12M 1793.604 us 390.838 us 16M 2386.848 us 518.187 us 24M 3563.296 us 775.989 us 32M 4747.171 us 1033.364 us After this optimization: sizeiommu_map_sg iommu_unmap 4K1.723 us 1.765 us 64K9.880 us 8.869 us 1M 155.364 us 135.223 us 2M 303.906 us 5.385 us 12M 1786.557 us21.250 us 16M 2391.890 us27.437 us 24M 3570.895 us39.937 us 32M 4755.234 us51.797 us This is further reduced once the map/unmap_pages() support gets in which will result in just 1 TLBIASID as compared to 16 TLBIASIDs. Real world data also shows big difference in unmap performance as below: There were reports of camera frame drops because of high overhead in iommu unmap without this optimization because of frequent unmaps issued by camera of about 100MB/s taking more than 100ms thereby causing frame drops. Signed-off-by: Sai Prakash Ranjan --- Changes in v3: * Move the logic to arm-smmu driver from io-pgtable (Robin) * Use a new set of iommu_flush_ops->arm_smmu_s1_tlb_impl_ops and use it for qcom impl Changes in v2: * Add a quirk to choose tlb_flush_all in partial walk flush * Set the quirk for QTI SoC implementation --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 + drivers/iommu/arm/arm-smmu/arm-smmu.c | 17 - 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7771d40176de..218c71465819 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -10,6 +10,8 @@ #include "arm-smmu.h" +extern const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops; + struct qcom_smmu { struct arm_smmu_device smmu; bool bypass_quirk; @@ -146,6 +148,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, { struct adreno_smmu_priv *priv; + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; + /* Only enable split pagetables for the GPU device (SID 0) */ if (!qcom_adreno_smmu_is_gpu_device(dev)) return 0; @@ -185,6 +189,14 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { } }; +static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) +{ + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; + + return 0; +} + static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu) { unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 1); @@ -308,6 +320,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu) } static const struct arm_smmu_impl qcom_smmu_impl = { + .init_context = qcom_smmu_init_context,
Re: [PATCH v2 11/11] memory: mtk-smi: mt8195: Add initial setting for smi-larb
Hi, On Thu, Jul 29, 2021 at 2:41 PM Yong Wu wrote: > > Hi Ikjoon, > > Just a ping. > > On Thu, 2021-07-22 at 14:38 +0800, Yong Wu wrote: > > On Wed, 2021-07-21 at 21:40 +0800, Ikjoon Jang wrote: > > > On Thu, Jul 15, 2021 at 8:23 PM Yong Wu wrote: > > > > > > > > To improve the performance, We add some initial setting for smi larbs. > > > > there are two part: > > > > 1), Each port has the special ostd(outstanding) value in each larb. > > > > 2), Two general setting for each larb. Honestly, I think nobody outside Mediatek will understand this. Can you please update this to be more generic? Like "Apply default bus settings for mt8195, without this, XXX problems can happen.. "? Or for example, adding brief descriptions on what MTK_SMI_FLAG_LARB_THRT_EN, MTK_SMI_FLAG_LARB_SW_FLAG, and MTK_SMI_FLAG_LARB_SW_FLAG[] are for would be better if it's available. > > > > > > > > In some SoC, this setting maybe changed dynamically for some special > > > > case > > > > like 4K, and this initial setting is enough in mt8195. > > > > > > > > Signed-off-by: Yong Wu > > > > --- > > [...] > > > > struct mtk_smi { > > > > @@ -213,12 +228,22 @@ static void > > > > mtk_smi_larb_config_port_mt8173(struct device *dev) > > > > static void mtk_smi_larb_config_port_gen2_general(struct device *dev) > > > > { > > > > struct mtk_smi_larb *larb = dev_get_drvdata(dev); > > > > - u32 reg; > > > > + u32 reg, flags_general = larb->larb_gen->flags_general; > > > > + const u8 *larbostd = larb->larb_gen->ostd[larb->larbid]; > > > > int i; > > > > > > > > if (BIT(larb->larbid) & > > > > larb->larb_gen->larb_direct_to_common_mask) > > > > return; > > > > > > > > + if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_THRT_EN)) > > > > + writel_relaxed(SMI_LARB_THRT_EN, larb->base + > > > > SMI_LARB_CMD_THRT_CON); > > > > + > > > > + if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_SW_FLAG)) > > > > + writel_relaxed(SMI_LARB_SW_FLAG_1, larb->base + > > > > SMI_LARB_SW_FLAG); > > > > + > > > > + for (i = 0; i < SMI_LARB_PORT_NR_MAX && larbostd && > > > > !!larbostd[i]; i++) > > > > + writel_relaxed(larbostd[i], larb->base + > > > > SMI_LARB_OSTDL_PORTx(i)); > > > > > > All other mtk platform's larbs have the same format for > > > SMI_LARB_OSTDL_PORTx() > > > registers at the same offset? or is this unique feature for mt8195? > > > > All the other Platform's larbs have the same format at the same offset. > > In this case, Do you have some other further comment? If no, I will keep > the current solution for this. Sorry for the late response, I have no further comments or any objections on here. Please go ahead for v3. I just had no idea on the register definitions and wanted to be sure that newly added register definitions are common to all MTK platforms. Thanks! > > Thanks. > > > > > > > > > > + > > > > for_each_set_bit(i, (unsigned long *)larb->mmu, 32) { > > > > reg = readl_relaxed(larb->base + > > > > SMI_LARB_NONSEC_CON(i)); > > > > reg |= F_MMU_EN; > > > > @@ -227,6 +252,51 @@ static void > > > > mtk_smi_larb_config_port_gen2_general(struct device *dev) > > > > } > > > > } > > > > > > > > [...] > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC v2] /dev/iommu uAPI proposal
> From: David Gibson > Sent: Tuesday, August 3, 2021 9:51 AM > > On Wed, Jul 28, 2021 at 04:04:24AM +, Tian, Kevin wrote: > > Hi, David, > > > > > From: David Gibson > > > Sent: Monday, July 26, 2021 12:51 PM > > > > > > On Fri, Jul 09, 2021 at 07:48:44AM +, Tian, Kevin wrote: > > > > /dev/iommu provides an unified interface for managing I/O page tables > for > > > > devices assigned to userspace. Device passthrough frameworks (VFIO, > > > vDPA, > > > > etc.) are expected to use this interface instead of creating their own > logic to > > > > isolate untrusted device DMAs initiated by userspace. > > > > > > > > This proposal describes the uAPI of /dev/iommu and also sample > > > sequences > > > > with VFIO as example in typical usages. The driver-facing kernel API > > > provided > > > > by the iommu layer is still TBD, which can be discussed after consensus > is > > > > made on this uAPI. > > > > > > > > It's based on a lengthy discussion starting from here: > > > > https://lore.kernel.org/linux- > > > iommu/20210330132830.go2356...@nvidia.com/ > > > > > > > > v1 can be found here: > > > > https://lore.kernel.org/linux- > > > > iommu/PH0PR12MB54811863B392C644E5365446DC3E9@PH0PR12MB5481.n > > > amprd12.prod.outlook.com/T/ > > > > > > > > This doc is also tracked on github, though it's not very useful for > > > > v1->v2 > > > > given dramatic refactoring: > > > > https://github.com/luxis1999/dev_iommu_uapi > > > > > > Thanks for all your work on this, Kevin. Apart from the actual > > > semantic improvements, I'm finding v2 significantly easier to read and > > > understand than v1. > > > > > > [snip] > > > > 1.2. Attach Device to I/O address space > > > > +++ > > > > > > > > Device attach/bind is initiated through passthrough framework uAPI. > > > > > > > > Device attaching is allowed only after a device is successfully bound to > > > > the IOMMU fd. User should provide a device cookie when binding the > > > > device through VFIO uAPI. This cookie is used when the user queries > > > > device capability/format, issues per-device iotlb invalidation and > > > > receives per-device I/O page fault data via IOMMU fd. > > > > > > > > Successful binding puts the device into a security context which > > > > isolates > > > > its DMA from the rest system. VFIO should not allow user to access the > > > > device before binding is completed. Similarly, VFIO should prevent the > > > > user from unbinding the device before user access is withdrawn. > > > > > > > > When a device is in an iommu group which contains multiple devices, > > > > all devices within the group must enter/exit the security context > > > > together. Please check {1.3} for more info about group isolation via > > > > this device-centric design. > > > > > > > > Successful attaching activates an I/O address space in the IOMMU, > > > > if the device is not purely software mediated. VFIO must provide device > > > > specific routing information for where to install the I/O page table in > > > > the IOMMU for this device. VFIO must also guarantee that the attached > > > > device is configured to compose DMAs with the routing information > that > > > > is provided in the attaching call. When handling DMA requests, IOMMU > > > > identifies the target I/O address space according to the routing > > > > information carried in the request. Misconfiguration breaks DMA > > > > isolation thus could lead to severe security vulnerability. > > > > > > > > Routing information is per-device and bus specific. For PCI, it is > > > > Requester ID (RID) identifying the device plus optional Process Address > > > > Space ID (PASID). For ARM, it is Stream ID (SID) plus optional Sub- > Stream > > > > ID (SSID). PASID or SSID is used when multiple I/O address spaces are > > > > enabled on a single device. For simplicity and continuity reason the > > > > following context uses RID+PASID though SID+SSID may sound a clearer > > > > naming from device p.o.v. We can decide the actual naming when > coding. > > > > > > > > Because one I/O address space can be attached by multiple devices, > > > > per-device routing information (plus device cookie) is tracked under > > > > each IOASID and is used respectively when activating the I/O address > > > > space in the IOMMU for each attached device. > > > > > > > > The device in the /dev/iommu context always refers to a physical one > > > > (pdev) which is identifiable via RID. Physically each pdev can support > > > > one default I/O address space (routed via RID) and optionally multiple > > > > non-default I/O address spaces (via RID+PASID). > > > > > > > > The device in VFIO context is a logic concept, being either a physical > > > > device (pdev) or mediated device (mdev or subdev). Each vfio device > > > > is represented by RID+cookie in IOMMU fd. User is allowed to create > > > > one default I/O address space (routed by vRID from user p.o.v) per > > > > each vfio_device.
Re: [RFC v2] /dev/iommu uAPI proposal
On Fri, Jul 30, 2021 at 11:51:23AM -0300, Jason Gunthorpe wrote: > On Mon, Jul 26, 2021 at 02:50:48PM +1000, David Gibson wrote: > > > That said, I'm still finding the various ways a device can attach to > > an ioasid pretty confusing. Here are some thoughts on some extra > > concepts that might make it easier to handle [note, I haven't thought > > this all the way through so far, so there might be fatal problems with > > this approach]. > > I think you've summarized how I've been viewing this problem. All the > concepts you pointed to should show through in the various APIs at the > end, one way or another. > > How much we need to expose to userspace, I don't know. > > Does userspace need to care how the system labels traffic between DMA > endpoint and the IOASID? At some point maybe yes since stuff like > PASID does leak out in various spots Yeah, I'm not sure. I think it probably doesn't for the "main path" of the API, though we might want to expose that for debugging and some edge cases. We *should* however be exposing the address type for each IOAS, since that affects how your MAP operations will work, as well as what endpoints are compatible with the IOAS. > > /dev/iommu would work entirely (or nearly so) in terms of endpoint > > handles, not device handles. Endpoints are what get bound to an IOAS, > > and endpoints are what get the user chosen endpoint cookie. > > While an accurate modeling of groups, it feels like an > overcomplication at this point in history where new HW largely doesn't > need it. So.. first, is that really true across the board? I expect it's true of high end server hardware, but for consumer level and embedded hardware as well? Then there's virtual hardware - I could point to several things still routinely using emulated PCIe to PCI bridges in qemu. Second, we can't just ignore older hardware. > The user interface VFIO and others presents is device > centric, inserting a new endpoint object is going going back to some > kind of group centric view of the world. Well, kind of, yeah, because I still think the concept has value. Part of the trouble is that "device" is pretty ambiguous. "Device" in the sense of PCI address for register interface may not be the same as "device" in terms of DMA RID may not be the same as as "device" in terms of Linux struct device terms of PCI register interface is not the same as "device" in terms of RID / DMA identifiability is not the same "device" in terms of what. > I'd rather deduce the endpoint from a collection of devices than the > other way around... Which I think is confusing, and in any case doesn't cover the case of one "device" with multiple endpoints. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC v2] /dev/iommu uAPI proposal
On Wed, Jul 28, 2021 at 04:04:24AM +, Tian, Kevin wrote: > Hi, David, > > > From: David Gibson > > Sent: Monday, July 26, 2021 12:51 PM > > > > On Fri, Jul 09, 2021 at 07:48:44AM +, Tian, Kevin wrote: > > > /dev/iommu provides an unified interface for managing I/O page tables for > > > devices assigned to userspace. Device passthrough frameworks (VFIO, > > vDPA, > > > etc.) are expected to use this interface instead of creating their own > > > logic to > > > isolate untrusted device DMAs initiated by userspace. > > > > > > This proposal describes the uAPI of /dev/iommu and also sample > > sequences > > > with VFIO as example in typical usages. The driver-facing kernel API > > provided > > > by the iommu layer is still TBD, which can be discussed after consensus is > > > made on this uAPI. > > > > > > It's based on a lengthy discussion starting from here: > > > https://lore.kernel.org/linux- > > iommu/20210330132830.go2356...@nvidia.com/ > > > > > > v1 can be found here: > > > https://lore.kernel.org/linux- > > iommu/PH0PR12MB54811863B392C644E5365446DC3E9@PH0PR12MB5481.n > > amprd12.prod.outlook.com/T/ > > > > > > This doc is also tracked on github, though it's not very useful for v1->v2 > > > given dramatic refactoring: > > > https://github.com/luxis1999/dev_iommu_uapi > > > > Thanks for all your work on this, Kevin. Apart from the actual > > semantic improvements, I'm finding v2 significantly easier to read and > > understand than v1. > > > > [snip] > > > 1.2. Attach Device to I/O address space > > > +++ > > > > > > Device attach/bind is initiated through passthrough framework uAPI. > > > > > > Device attaching is allowed only after a device is successfully bound to > > > the IOMMU fd. User should provide a device cookie when binding the > > > device through VFIO uAPI. This cookie is used when the user queries > > > device capability/format, issues per-device iotlb invalidation and > > > receives per-device I/O page fault data via IOMMU fd. > > > > > > Successful binding puts the device into a security context which isolates > > > its DMA from the rest system. VFIO should not allow user to access the > > > device before binding is completed. Similarly, VFIO should prevent the > > > user from unbinding the device before user access is withdrawn. > > > > > > When a device is in an iommu group which contains multiple devices, > > > all devices within the group must enter/exit the security context > > > together. Please check {1.3} for more info about group isolation via > > > this device-centric design. > > > > > > Successful attaching activates an I/O address space in the IOMMU, > > > if the device is not purely software mediated. VFIO must provide device > > > specific routing information for where to install the I/O page table in > > > the IOMMU for this device. VFIO must also guarantee that the attached > > > device is configured to compose DMAs with the routing information that > > > is provided in the attaching call. When handling DMA requests, IOMMU > > > identifies the target I/O address space according to the routing > > > information carried in the request. Misconfiguration breaks DMA > > > isolation thus could lead to severe security vulnerability. > > > > > > Routing information is per-device and bus specific. For PCI, it is > > > Requester ID (RID) identifying the device plus optional Process Address > > > Space ID (PASID). For ARM, it is Stream ID (SID) plus optional Sub-Stream > > > ID (SSID). PASID or SSID is used when multiple I/O address spaces are > > > enabled on a single device. For simplicity and continuity reason the > > > following context uses RID+PASID though SID+SSID may sound a clearer > > > naming from device p.o.v. We can decide the actual naming when coding. > > > > > > Because one I/O address space can be attached by multiple devices, > > > per-device routing information (plus device cookie) is tracked under > > > each IOASID and is used respectively when activating the I/O address > > > space in the IOMMU for each attached device. > > > > > > The device in the /dev/iommu context always refers to a physical one > > > (pdev) which is identifiable via RID. Physically each pdev can support > > > one default I/O address space (routed via RID) and optionally multiple > > > non-default I/O address spaces (via RID+PASID). > > > > > > The device in VFIO context is a logic concept, being either a physical > > > device (pdev) or mediated device (mdev or subdev). Each vfio device > > > is represented by RID+cookie in IOMMU fd. User is allowed to create > > > one default I/O address space (routed by vRID from user p.o.v) per > > > each vfio_device. VFIO decides the routing information for this default > > > space based on device type: > > > > > > 1) pdev, routed via RID; > > > > > > 2) mdev/subdev with IOMMU-enforced DMA isolation, routed via > > > the parent's RID plus the PASID marking this mdev; > > > > > > 3) a purely sw-
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 2, 2021 at 8:14 AM Will Deacon wrote: > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY > > > > > > flag") > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > > > > the memory type setting required for the non-coherent masters to use > > > > > > system cache. Now that system cache support for GPU is added, we > > > > > > will > > > > > > need to set the right PTE attribute for GPU buffers to be sys > > > > > > cached. > > > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC > > > > > > and makes GPU the user of this protection flag. > > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does > > > > > not apply anymore? > > > > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then > > > > I can repost the patch. > > > > > > I still think you need to handle the mismatched alias, no? You're adding > > > a new memory type to the SMMU which doesn't exist on the CPU side. That > > > can't be right. > > > > > > > Just curious, and maybe this is a dumb question, but what is your > > concern about mismatched aliases? I mean the cache hierarchy on the > > GPU device side (anything beyond the LLC) is pretty different and > > doesn't really care about the smmu pgtable attributes.. > > If the CPU accesses a shared buffer with different attributes to those which > the device is using then you fall into the "mismatched memory attributes" > part of the Arm architecture. It's reasonably unforgiving (you should go and > read it) and in some cases can apply to speculative accesses as well, but > the end result is typically loss of coherency. Ok, I might have a few other sections to read first to decipher the terminology.. But my understanding of LLC is that it looks just like system memory to the CPU and GPU (I think that would make it "the point of coherence" between the GPU and CPU?) If that is true, shouldn't it be invisible from the point of view of different CPU mapping options? BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks
On Mon, Aug 2, 2021 at 9:12 AM Will Deacon wrote: > > On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote: > > Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk > > of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in > > such cases, we would need to drop the XO clock vote in unprepare call and > > this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened) > > clock driver which controls RPMh managed clock resources for new QTI SoCs > > and is a blocking call. > > > > Given we cannot have a sleeping calls such as clk_bulk_prepare() and > > clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu > > operations like map and unmap can be in atomic context and are in fast > > path, add this prepare and unprepare call to drop the XO vote only for > > system pm callbacks since it is not a fast path and we expect the system > > to enter deep sleep states with system pm as opposed to runtime pm. > > > > This is a similar sequence of clock requests (prepare,enable and > > disable,unprepare) in arm-smmu probe and remove. > > > > Signed-off-by: Sai Prakash Ranjan > > Co-developed-by: Rajendra Nayak > > Signed-off-by: Rajendra Nayak > > --- > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++-- > > 1 file changed, 18 insertions(+), 2 deletions(-) > > [+Rob] > > How does this work with that funny GPU which writes to the SMMU registers > directly? Does the SMMU need to remain independently clocked for that to > work or is it all in the same clock domain? AFAIU the device_link stuff should keep the SMMU clocked as long as the GPU is alive, so I think this should work out ok.. ie. the SMMU won't suspend while the GPU is not suspended. BR, -R > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c > > b/drivers/iommu/arm/arm-smmu/arm-smmu.c > > index d3c6f54110a5..9561ba4c5d39 100644 > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > > @@ -2277,6 +2277,13 @@ static int __maybe_unused > > arm_smmu_runtime_suspend(struct device *dev) > > > > static int __maybe_unused arm_smmu_pm_resume(struct device *dev) > > { > > + int ret; > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev); > > + > > + ret = clk_bulk_prepare(smmu->num_clks, smmu->clks); > > + if (ret) > > + return ret; > > + > > if (pm_runtime_suspended(dev)) > > return 0; > > If we subsequently fail to enable the clks in arm_smmu_runtime_resume() > should we unprepare them again? > > Will > > > @@ -2285,10 +2292,19 @@ static int __maybe_unused arm_smmu_pm_resume(struct > > device *dev) > > > > static int __maybe_unused arm_smmu_pm_suspend(struct device *dev) > > { > > + int ret = 0; > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev); > > + > > if (pm_runtime_suspended(dev)) > > - return 0; > > + goto clk_unprepare; > > > > - return arm_smmu_runtime_suspend(dev); > > + ret = arm_smmu_runtime_suspend(dev); > > + if (ret) > > + return ret; > > + > > +clk_unprepare: > > + clk_bulk_unprepare(smmu->num_clks, smmu->clks); > > + return ret; > > } > > > > static const struct dev_pm_ops arm_smmu_pm_ops = { > > -- > > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > > of Code Aurora Forum, hosted by The Linux Foundation > > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC
Hi Rob, On Mon, Aug 2, 2021 at 5:09 PM Rajat Jain wrote: > > Hi Robin, Doug, > > On Wed, Jul 14, 2021 at 8:14 AM Doug Anderson wrote: > > > > Hi, > > > > On Tue, Jul 13, 2021 at 11:07 AM Robin Murphy wrote: > > > > > > On 2021-07-08 15:36, Doug Anderson wrote: > > > [...] > > > >> Or document for the users that want performance how to > > > >> change the setting, so that they can decide. > > > > > > > > Pushing this to the users can make sense for a Linux distribution but > > > > probably less sense for an embedded platform. So I'm happy to make > > > > some way for a user to override this (like via kernel command line), > > > > but I also strongly believe there should be a default that users don't > > > > have to futz with that we think is correct. > > > > > > FYI I did make progress on the "punt it to userspace" approach. I'm not > > > posting it even as an RFC yet because I still need to set up a machine > > > to try actually testing any of it (it's almost certainly broken > > > somewhere), but in the end it comes out looking surprisingly not too bad > > > overall. If you're curious to take a look in the meantime I put it here: > > > > > > https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/fq BTW, is there another mirror to this? I (and another colleague) are getting the following error when trying to clone it: rajatja@rajat2:~/rob_iommu$ git clone https://git.gitlab.arm.com/linux-arm/linux-rm.git Cloning into 'linux-rm'... remote: Enumerating objects: 125712, done. remote: Counting objects: 100% (125712/125712), done. remote: Compressing objects: 100% (41203/41203), done. error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 804 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF fatal: fetch-pack: invalid index-pack output rajatja@rajat2:~/rob_iommu$ We've tried both git and https methods. > > I was wondering if you got any closer to testing / sending it out? I > looked at the patches and am trying to understand, would they also > make it possible to convert at runtime, an existing "non-strict" > domain (for a particular device) into a "strict" domain leaving the > other devices/domains as-is? Please let me know when you think your > patches are good to be tested, and I'd also be interested in trying > them out. > > > > > Being able to change this at runtime through sysfs sounds great and it > > fills all the needs I'm aware of, thanks! In Chrome OS we can just use > > this with some udev rules and get everything we need. > > I still have another (inverse) use case where this does not work: > We have an Intel chromebook with the default domain type being > non-strict. There is an LTE modem (an internal PCI device which cannot > be marked external), which we'd like to be treated as a "Strict" DMA > domain. > > Do I understand it right that using Rob's patches, I could potentially > switch the domain to "strict" *after* booting (since we don't use > initramfs), but by that time, the driver might have already attached > to the modem device (using "non-strict" domain), and thus the damage > may have already been done? So perhaps we still need a device property > that the firmware could use to indicate "strictness" for certain > devices at boot? > > Thanks, > Rajat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC
Hi Robin, Doug, On Wed, Jul 14, 2021 at 8:14 AM Doug Anderson wrote: > > Hi, > > On Tue, Jul 13, 2021 at 11:07 AM Robin Murphy wrote: > > > > On 2021-07-08 15:36, Doug Anderson wrote: > > [...] > > >> Or document for the users that want performance how to > > >> change the setting, so that they can decide. > > > > > > Pushing this to the users can make sense for a Linux distribution but > > > probably less sense for an embedded platform. So I'm happy to make > > > some way for a user to override this (like via kernel command line), > > > but I also strongly believe there should be a default that users don't > > > have to futz with that we think is correct. > > > > FYI I did make progress on the "punt it to userspace" approach. I'm not > > posting it even as an RFC yet because I still need to set up a machine > > to try actually testing any of it (it's almost certainly broken > > somewhere), but in the end it comes out looking surprisingly not too bad > > overall. If you're curious to take a look in the meantime I put it here: > > > > https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/fq I was wondering if you got any closer to testing / sending it out? I looked at the patches and am trying to understand, would they also make it possible to convert at runtime, an existing "non-strict" domain (for a particular device) into a "strict" domain leaving the other devices/domains as-is? Please let me know when you think your patches are good to be tested, and I'd also be interested in trying them out. > > Being able to change this at runtime through sysfs sounds great and it > fills all the needs I'm aware of, thanks! In Chrome OS we can just use > this with some udev rules and get everything we need. I still have another (inverse) use case where this does not work: We have an Intel chromebook with the default domain type being non-strict. There is an LTE modem (an internal PCI device which cannot be marked external), which we'd like to be treated as a "Strict" DMA domain. Do I understand it right that using Rob's patches, I could potentially switch the domain to "strict" *after* booting (since we don't use initramfs), but by that time, the driver might have already attached to the modem device (using "non-strict" domain), and thus the damage may have already been done? So perhaps we still need a device property that the firmware could use to indicate "strictness" for certain devices at boot? Thanks, Rajat ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 3/5] dma-mapping: Enable global non-coherent pool support for RISC-V
On Tue, Jul 27, 2021 at 1:52 AM Christoph Hellwig wrote: > > On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote: > > arch_dma_set_uncached works as well in this case. However, mips, > > niops2 & xtensa uses a > > fixed (via config) value for the offset. Similar approach can't be > > used here because the platform specific > > offset value has to be determined at runtime so that a single kernel > > image can boot on all platforms. > > Nothing in the interface requires a fixed offset. And using the offset > has one enormous advantage in that there is no need to declare a > statically sized pool - allocations are fully dynamic. And any kind of > fixed pool tends to cause huge problems. > > > 1. a new DT property so that arch specific code is aware of the > > non-cacheable window offset. > > Yes. > > > individual device if a per-device non-cacheable > >window support is required in future. As of now, the beagleV memory > > If you require a per-device noncachable area you can use the per-device > coherent pools. But why would you want that? > > > region lies in 0x10__0 - x17__ > >which is mapped to start of DRAM 0x8000. All of the > > non-coherent devices can do 32bit DMA only. > > Adjust ZONE_DMA32 so that it takes the uncached offset into account. > > > > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); > > > > + if (phys_addr == device_addr) > > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, > > > > size, true); > > > > + else > > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, > > > > size, false); > > > > > > Nak. The phys_addr != device_addr support is goign away. This needs > > > > ok. > > > > > to be filled in using dma-ranges property hanging of the struct device. > > > > struct device is only accessible in rmem_dma_device_init. I couldn't > > find a proper way to access it during > > dma_reserved_default_memory setup under global pool. > > > > Does that mean we should use a per-device memory pool instead of a > > global non-coherent pool ? > > Indeed, that would be a problem in this case. But if we can just > use the uncached offset directly I think everything will be much > simpler. Yes. I was planning to change this to use an uncached offset. However, the planned mass production for beaglev starlight sbc is cancelled now [1]. As there is no other board that requires an uncached offset, I don't think there is no usecase for adding uncached offset support for RISC-V right now. I will revisit(hopefully we don't have to) this in case any platform implements uncached window support in future. [1] https://www.cnx-software.com/2021/07/31/beaglev-starlight-sbc-wont-be-mass-manufactured-redesigned-beaglev-risc-v-sbc-expected-in-q1-2022/ -- Regards, Atish ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On 02/08/2021 17:40, John Garry wrote: On 02/08/2021 17:16, Robin Murphy wrote: On 2021-08-02 17:06, John Garry wrote: On 02/08/2021 16:06, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote: Add max opt argument to init_iova_domain(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Sure, I can do something like that. I actually did have separate along those lines in v3 before I decided to change it. Y'know, at this point I'm now starting to seriously wonder whether moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of things simpler... :/ As I see, the rcache stuff isn't really specific to IOVA anyway, so it seems sane. Does that sound like crazy talk to you, or an idea worth entertaining? If you're going to start moving things, has anyone considered putting rcache support in lib as a generic solution to "Magazines and Vmem: .." paper? Having said that, I still think that the rcache code has certain scalability issues, as discussed before. So making more generic and then discarding would be less than ideal. Thanks, john ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On 02/08/2021 17:16, Robin Murphy wrote: On 2021-08-02 17:06, John Garry wrote: On 02/08/2021 16:06, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote: Add max opt argument to init_iova_domain(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Sure, I can do something like that. I actually did have separate along those lines in v3 before I decided to change it. Y'know, at this point I'm now starting to seriously wonder whether moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of things simpler... :/ As I see, the rcache stuff isn't really specific to IOVA anyway, so it seems sane. Does that sound like crazy talk to you, or an idea worth entertaining? If you're going to start moving things, has anyone considered putting rcache support in lib as a generic solution to "Magazines and Vmem: .." paper? Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On 2021-08-02 17:06, John Garry wrote: On 02/08/2021 16:06, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote: Add max opt argument to init_iova_domain(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Sure, I can do something like that. I actually did have separate along those lines in v3 before I decided to change it. Y'know, at this point I'm now starting to seriously wonder whether moving the rcaches into iommu_dma_cookie wouldn't make a whole lot of things simpler... :/ Does that sound like crazy talk to you, or an idea worth entertaining? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacks
On Tue, Jul 27, 2021 at 03:03:22PM +0530, Sai Prakash Ranjan wrote: > Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk > of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in > such cases, we would need to drop the XO clock vote in unprepare call and > this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened) > clock driver which controls RPMh managed clock resources for new QTI SoCs > and is a blocking call. > > Given we cannot have a sleeping calls such as clk_bulk_prepare() and > clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu > operations like map and unmap can be in atomic context and are in fast > path, add this prepare and unprepare call to drop the XO vote only for > system pm callbacks since it is not a fast path and we expect the system > to enter deep sleep states with system pm as opposed to runtime pm. > > This is a similar sequence of clock requests (prepare,enable and > disable,unprepare) in arm-smmu probe and remove. > > Signed-off-by: Sai Prakash Ranjan > Co-developed-by: Rajendra Nayak > Signed-off-by: Rajendra Nayak > --- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 ++-- > 1 file changed, 18 insertions(+), 2 deletions(-) [+Rob] How does this work with that funny GPU which writes to the SMMU registers directly? Does the SMMU need to remain independently clocked for that to work or is it all in the same clock domain? > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c > b/drivers/iommu/arm/arm-smmu/arm-smmu.c > index d3c6f54110a5..9561ba4c5d39 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -2277,6 +2277,13 @@ static int __maybe_unused > arm_smmu_runtime_suspend(struct device *dev) > > static int __maybe_unused arm_smmu_pm_resume(struct device *dev) > { > + int ret; > + struct arm_smmu_device *smmu = dev_get_drvdata(dev); > + > + ret = clk_bulk_prepare(smmu->num_clks, smmu->clks); > + if (ret) > + return ret; > + > if (pm_runtime_suspended(dev)) > return 0; If we subsequently fail to enable the clks in arm_smmu_runtime_resume() should we unprepare them again? Will > @@ -2285,10 +2292,19 @@ static int __maybe_unused arm_smmu_pm_resume(struct > device *dev) > > static int __maybe_unused arm_smmu_pm_suspend(struct device *dev) > { > + int ret = 0; > + struct arm_smmu_device *smmu = dev_get_drvdata(dev); > + > if (pm_runtime_suspended(dev)) > - return 0; > + goto clk_unprepare; > > - return arm_smmu_runtime_suspend(dev); > + ret = arm_smmu_runtime_suspend(dev); > + if (ret) > + return ret; > + > +clk_unprepare: > + clk_bulk_unprepare(smmu->num_clks, smmu->clks); > + return ret; > } > > static const struct dev_pm_ops arm_smmu_pm_ops = { > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible
On 2021-08-02 16:23, John Garry wrote: On 02/08/2021 16:01, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote: Some LLDs may request DMA mappings whose IOVA length exceeds that of the current rcache upper limit. What's an LLD? low-level driver maybe I'll stick with simply "drivers" This means that allocations for those IOVAs will never be cached, and always must be allocated and freed from the RB tree per DMA mapping cycle. This has a significant effect on performance, more so since commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"), as discussed at [0]. As a first step towards allowing the rcache range upper limit be configured, hold this value in the IOVA rcache structure, and allocate the rcaches separately. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iova.c | 23 +-- include/linux/iova.h | 4 ++-- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 98ba927aee1a..4772278aa5da 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, * rounding up anything cacheable to make sure that can't happen. The * order of the unadjusted size will still match upon freeing. */ - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + if (iova_len < (1 << (iovad->rcache_max_size - 1))) iova_len = roundup_pow_of_two(iova_len); dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit); diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b6cf5f16123b..07ce73fdd8c1 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -15,6 +15,8 @@ /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR ~0UL +#define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER? Yeah, that may be better. I was just using the same name as before. + static bool iova_rcache_insert(struct iova_domain *iovad, unsigned long pfn, unsigned long size); @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain *iovad) unsigned int cpu; int i; - for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE; + + iovad->rcaches = kcalloc(iovad->rcache_max_size, + sizeof(*iovad->rcaches), GFP_KERNEL); + if (!iovad->rcaches) + return; Returning quietly here doesn't seem like the right thing to do. At least, I don't think the rest of the functions here are checking rcaches against NULL. For sure, but that is what other code which can fail here already does, like: static void init_iova_rcaches(struct iova_domain *iovad) { ... for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { ... rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); if (WARN_ON(!rcache->cpu_rcaches)) continue; } and that is not safe either. Yeah, along with flush queues, historically this has all been super-dodgy in terms of failure handling (or lack of). This issue was raised a while ago. I don't mind trying to fix it - a slightly painful part is that it touches a few subsystems. Maybe pull the rcache init out of iova_domain_init() entirely? Only iommu-dma uses {alloc,free}_iova_fast(), so TBH it's only a great big waste of memory for all the other IOVA domain users anyway. The other week I started pondering how much of iommu-dma only needs to be exposed to the IOMMU core rather than the whole kernel now; I suppose there's probably an equal argument to be made for some of these bits of the IOVA API, and this might pave the way towards some more logical separation, but let's get the functional side dealt with before we worry too much about splitting headers. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On 02/08/2021 16:06, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote: Add max opt argument to init_iova_domain(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Sure, I can do something like that. I actually did have separate along those lines in v3 before I decided to change it. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation
On 2021-08-02 16:16, Will Deacon wrote: On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote: Multiple iommu domains and iommu groups are getting created for the devices sharing same SID. It is expected for devices sharing same SID to be in same iommu group and same iommu domain. This is leading to context faults when one device is accessing IOVA from other device which shouldn't be the case for devices sharing same SID. Fix this by protecting iommu domain and iommu group creation with mutexes. Robin -- any chance you could take a look at these, please? You had some comments on the first version which convinced me that they are needed, but I couldn't tell whether you wanted to solve this a different way or not. Sorry, I was lamenting that this came to light due to the of_iommu_configure() flow being yucky, but that wasn't meant to imply that there aren't - or couldn't be in future - better reasons for iommu_probe_device() to be robust against concurrency anyway. I do think these are legitimate fixes to make in their own right, even if the current need might get swept back under the rug in future. I would say, however, that the commit messages seem to focus too much on the wrong details and aren't overly useful, and patch #2 is missing Ashish's sign-off. Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv3] iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementation
On Wed, Jun 23, 2021 at 07:12:01PM +0530, Sai Prakash Ranjan wrote: > Currently for iommu_unmap() of large scatter-gather list with page size > elements, the majority of time is spent in flushing of partial walks in > __arm_lpae_unmap() which is a VA based TLB invalidation invalidating > page-by-page on iommus like arm-smmu-v2 (TLBIVA). > > For example: to unmap a 32MB scatter-gather list with page size elements > (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB > for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) > resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge > overhead. > > On qcom implementation, there are several performance improvements for > TLB cache invalidations in HW like wait-for-safe (for realtime clients > such as camera and display) and few others to allow for cache > lookups/updates when TLBI is in progress for the same context bank. > So the cost of over-invalidation is less compared to the unmap latency > on several usecases like camera which deals with large buffers. So, > ASID based TLB invalidations (TLBIASID) can be used to invalidate the > entire context for partial walk flush thereby improving the unmap > latency. > > Non-strict mode can use this by default for all platforms given its > all about over-invalidation saving time on individual unmaps and > non-deterministic generally. > > For this example of 32MB scatter-gather list unmap, this change results > in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 > TLBIVAs thereby increasing the performance of unmaps drastically. > > Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap: > (average over 10 iterations) > > Before this optimization: > > sizeiommu_map_sg iommu_unmap > 4K2.067 us 1.854 us > 64K9.598 us 8.802 us > 1M 148.890 us 130.718 us > 2M 305.864 us67.291 us > 12M 1793.604 us 390.838 us > 16M 2386.848 us 518.187 us > 24M 3563.296 us 775.989 us > 32M 4747.171 us 1033.364 us > > After this optimization: > > sizeiommu_map_sg iommu_unmap > 4K1.723 us 1.765 us > 64K9.880 us 8.869 us > 1M 155.364 us 135.223 us > 2M 303.906 us 5.385 us > 12M 1786.557 us21.250 us > 16M 2391.890 us27.437 us > 24M 3570.895 us39.937 us > 32M 4755.234 us51.797 us > > This is further reduced once the map/unmap_pages() support gets in which > will result in just 1 TLBIASID as compared to 16 TLBIASIDs. > > Real world data also shows big difference in unmap performance as below: > > There were reports of camera frame drops because of high overhead in > iommu unmap without this optimization because of frequent unmaps issued > by camera of about 100MB/s taking more than 100ms thereby causing frame > drops. > > Signed-off-by: Sai Prakash Ranjan > --- > > Changes in v3: > * Move the logic to arm-smmu driver from io-pgtable (Robin) > * Use a new set of iommu_flush_ops->arm_smmu_s1_tlb_impl_ops and use it for > qcom impl > > Changes in v2: > * Add a quirk to choose tlb_flush_all in partial walk flush > * Set the quirk for QTI SoC implementation > > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 + > drivers/iommu/arm/arm-smmu/arm-smmu.c | 17 - > 2 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > index 7771d40176de..218c71465819 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > @@ -10,6 +10,8 @@ > > #include "arm-smmu.h" > > +extern const struct iommu_flush_ops arm_smmu_s1_tlb_impl_ops; > + > struct qcom_smmu { > struct arm_smmu_device smmu; > bool bypass_quirk; > @@ -146,6 +148,8 @@ static int qcom_adreno_smmu_init_context(struct > arm_smmu_domain *smmu_domain, > { > struct adreno_smmu_priv *priv; > > + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; > + > /* Only enable split pagetables for the GPU device (SID 0) */ > if (!qcom_adreno_smmu_is_gpu_device(dev)) > return 0; > @@ -185,6 +189,14 @@ static const struct of_device_id > qcom_smmu_client_of_match[] __maybe_unused = { > { } > }; > > +static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain, > + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) > +{ > + pgtbl_cfg->tlb = &arm_smmu_s1_tlb_impl_ops; > + > + return 0; > +} > + > static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu) > { > unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - > 1); > @@ -308,6 +320,7 @@
Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible
On 02/08/2021 16:01, Will Deacon wrote: On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote: Some LLDs may request DMA mappings whose IOVA length exceeds that of the current rcache upper limit. What's an LLD? low-level driver maybe I'll stick with simply "drivers" This means that allocations for those IOVAs will never be cached, and always must be allocated and freed from the RB tree per DMA mapping cycle. This has a significant effect on performance, more so since commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"), as discussed at [0]. As a first step towards allowing the rcache range upper limit be configured, hold this value in the IOVA rcache structure, and allocate the rcaches separately. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ Signed-off-by: John Garry --- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iova.c | 23 +-- include/linux/iova.h | 4 ++-- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 98ba927aee1a..4772278aa5da 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, * rounding up anything cacheable to make sure that can't happen. The * order of the unadjusted size will still match upon freeing. */ - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + if (iova_len < (1 << (iovad->rcache_max_size - 1))) iova_len = roundup_pow_of_two(iova_len); dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit); diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b6cf5f16123b..07ce73fdd8c1 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -15,6 +15,8 @@ /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR ~0UL +#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size (in pages) */ Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER? Yeah, that may be better. I was just using the same name as before. + static bool iova_rcache_insert(struct iova_domain *iovad, unsigned long pfn, unsigned long size); @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain *iovad) unsigned int cpu; int i; - for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE; + + iovad->rcaches = kcalloc(iovad->rcache_max_size, +sizeof(*iovad->rcaches), GFP_KERNEL); + if (!iovad->rcaches) + return; Returning quietly here doesn't seem like the right thing to do. At least, I don't think the rest of the functions here are checking rcaches against NULL. For sure, but that is what other code which can fail here already does, like: static void init_iova_rcaches(struct iova_domain *iovad) { ... for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { ... rcache->cpu_rcaches = __alloc_percpu(sizeof(*cpu_rcache), cache_line_size()); if (WARN_ON(!rcache->cpu_rcaches)) continue; } and that is not safe either. This issue was raised a while ago. I don't mind trying to fix it - a slightly painful part is that it touches a few subsystems. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation
On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote: > Multiple iommu domains and iommu groups are getting created for the devices > sharing same SID. It is expected for devices sharing same SID to be in same > iommu group and same iommu domain. > This is leading to context faults when one device is accessing IOVA from > other device which shouldn't be the case for devices sharing same SID. > Fix this by protecting iommu domain and iommu group creation with mutexes. Robin -- any chance you could take a look at these, please? You had some comments on the first version which convinced me that they are needed, but I couldn't tell whether you wanted to solve this a different way or not. Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote: > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > > > the memory type setting required for the non-coherent masters to use > > > > > system cache. Now that system cache support for GPU is added, we will > > > > > need to set the right PTE attribute for GPU buffers to be sys cached. > > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC > > > > > and makes GPU the user of this protection flag. > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does > > > > not apply anymore? > > > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then > > > I can repost the patch. > > > > I still think you need to handle the mismatched alias, no? You're adding > > a new memory type to the SMMU which doesn't exist on the CPU side. That > > can't be right. > > > > Just curious, and maybe this is a dumb question, but what is your > concern about mismatched aliases? I mean the cache hierarchy on the > GPU device side (anything beyond the LLC) is pretty different and > doesn't really care about the smmu pgtable attributes.. If the CPU accesses a shared buffer with different attributes to those which the device is using then you fall into the "mismatched memory attributes" part of the Arm architecture. It's reasonably unforgiving (you should go and read it) and in some cases can apply to speculative accesses as well, but the end result is typically loss of coherency. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iova: Add iova_len argument to init_iova_domain()
On Wed, Jul 14, 2021 at 06:36:42PM +0800, John Garry wrote: > Add max opt argument to init_iova_domain(), and use it to set the rcaches > range. > > Also fix up all users to set this value (at 0, meaning use default). Wrap that in init_iova_domain_defaults() to avoid the mysterious 0? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Remove stale amd_iommu_unmap_flush usage
From: Joerg Roedel Remove the new use of the variable introduced in the AMD driver branch. The variable was removed already in the iommu core branch, causing build errors when the brances are merged. Cc: Nadav Amit Cc: Zhen Lei Signed-off-by: Joerg Roedel --- drivers/iommu/amd/init.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 239556c1f698..bdcf167b4afe 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1850,11 +1850,9 @@ static int __init iommu_init_pci(struct amd_iommu *iommu) return ret; if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) { - if (!amd_iommu_unmap_flush) - pr_info("IOMMU batching is disabled due to virtualization\n"); - + pr_info("Using strict mode due to virtualization\n"); + iommu_set_dma_strict(); amd_iommu_np_cache = true; - amd_iommu_unmap_flush = true; } init_iommu_perf_ctr(iommu); -- 2.31.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Mon, Aug 2, 2021 at 3:55 AM Will Deacon wrote: > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > > On 2021-07-28 19:30, Georgi Djakov wrote: > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > > the memory type setting required for the non-coherent masters to use > > > > system cache. Now that system cache support for GPU is added, we will > > > > need to set the right PTE attribute for GPU buffers to be sys cached. > > > > Without this, the system cache lines are not allocated for GPU. > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC > > > > and makes GPU the user of this protection flag. > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does > > > not apply anymore? > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then > > I can repost the patch. > > I still think you need to handle the mismatched alias, no? You're adding > a new memory type to the SMMU which doesn't exist on the CPU side. That > can't be right. > Just curious, and maybe this is a dumb question, but what is your concern about mismatched aliases? I mean the cache hierarchy on the GPU device side (anything beyond the LLC) is pretty different and doesn't really care about the smmu pgtable attributes.. BR, -R ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/6] iova: Allow rcache range upper limit to be flexible
On Wed, Jul 14, 2021 at 06:36:39PM +0800, John Garry wrote: > Some LLDs may request DMA mappings whose IOVA length exceeds that of the > current rcache upper limit. What's an LLD? > This means that allocations for those IOVAs will never be cached, and > always must be allocated and freed from the RB tree per DMA mapping cycle. > This has a significant effect on performance, more so since commit > 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search > fails"), as discussed at [0]. > > As a first step towards allowing the rcache range upper limit be > configured, hold this value in the IOVA rcache structure, and allocate > the rcaches separately. > > [0] > https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ > > Signed-off-by: John Garry > --- > drivers/iommu/dma-iommu.c | 2 +- > drivers/iommu/iova.c | 23 +-- > include/linux/iova.h | 4 ++-- > 3 files changed, 20 insertions(+), 9 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 98ba927aee1a..4772278aa5da 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -434,7 +434,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct > iommu_domain *domain, >* rounding up anything cacheable to make sure that can't happen. The >* order of the unadjusted size will still match upon freeing. >*/ > - if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) > + if (iova_len < (1 << (iovad->rcache_max_size - 1))) > iova_len = roundup_pow_of_two(iova_len); > > dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit); > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index b6cf5f16123b..07ce73fdd8c1 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -15,6 +15,8 @@ > /* The anchor node sits above the top of the usable address space */ > #define IOVA_ANCHOR ~0UL > > +#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size > (in pages) */ Is that the same as an 'order'? i.e. IOVA_RANGE_CACHE_MAX_ORDER? > + > static bool iova_rcache_insert(struct iova_domain *iovad, > unsigned long pfn, > unsigned long size); > @@ -881,7 +883,14 @@ static void init_iova_rcaches(struct iova_domain *iovad) > unsigned int cpu; > int i; > > - for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { > + iovad->rcache_max_size = IOVA_RANGE_CACHE_MAX_SIZE; > + > + iovad->rcaches = kcalloc(iovad->rcache_max_size, > + sizeof(*iovad->rcaches), GFP_KERNEL); > + if (!iovad->rcaches) > + return; Returning quietly here doesn't seem like the right thing to do. At least, I don't think the rest of the functions here are checking rcaches against NULL. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Implement [map/unmap]_pages callbacks for ARM SMMUV3
On Mon, Aug 02, 2021 at 03:43:20PM +0100, Will Deacon wrote: > For both patches: > > Acked-by: Will Deacon > > Joerg -- please can you take these directly? They build on top of the > changes from Isaac which you have queued on your 'core' branch. Sure, applied to core branch now. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/6] iommu: Refactor iommu_group_store_type()
On Wed, Jul 14, 2021 at 06:36:38PM +0800, John Garry wrote: > Function iommu_group_store_type() supports changing the default domain > of an IOMMU group. > > Many conditions need to be satisfied and steps taken for this action to be > successful. > > Satisfying these conditions and steps will be required for setting other > IOMMU group attributes, so factor into a common part and a part specific > to update the IOMMU group attribute. > > No functional change intended. > > Some code comments are tidied up also. > > Signed-off-by: John Garry > --- > drivers/iommu/iommu.c | 73 +++ > 1 file changed, 46 insertions(+), 27 deletions(-) Acked-by: Will Deacon Although likely to conflict with Robin's monster series. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Implement [map/unmap]_pages callbacks for ARM SMMUV3
On Sat, Jul 31, 2021 at 10:17:09AM +0800, chenxiang wrote: > From: Xiang Chen > > The series ("Optimizing iommu_[map/unmap] performance") improve the > iommu_[map/unmap] performance. Based on the series, implement > [map/unmap]_pages > callbacks for ARM SMMUV3. > Use tool dma_map_benchmark to test the latency of map/unmap, and it promotes > much on it. The test result is as follows: > t = 1(thread = 1): >before opt(us) after opt(us) > g=1(4K size)0.1/1.3 0.1/0.8 > g=2(8K size)0.2/1.5 0.2/0.9 > g=4(16K size) 0.3/1.9 0.1/1.1 > g=8(32K size) 0.5/2.7 0.2/1.4 > g=16(64K size) 1.0/4.5 0.2/2.0 > g=32(128K size) 1.8/7.9 0.2/3.3 > g=64(256K size) 3.7/14.8 0.4/6.1 > g=128(512K size)7.1/14.7 0.5/10.4 > g=256(1M size) 14.0/53.90.8/19.3 > g=512(2M size) 0.2/0.9 0.2/0.9 > g=1024(4M size) 0.5/1.5 0.4/1.0 > > t = 10(thread = 10): >before opt(us) after opt(us) > g=1(4K size)0.3/7.0 0.1/5.8 > g=2(8K size)0.4/6.7 0.3/6.0 > g=4(16K size) 0.5/6.3 0.3/5.6 > g=8(32K size) 0.5/8.3 0.2/6.3 > g=16(64K size) 1.0/17.3 0.3/12.4 > g=32(128K size) 1.8/36.0 0.2/24.2 > g=64(256K size) 4.3/67.2 1.2/46.4 > g=128(512K size)7.8/93.7 1.3/94.2 > g=256(1M size) 14.7/280.8 1.8/191.5 > g=512(2M size) 3.6/3.2 1.5/2.5 > g=1024(4M size) 2.0/3.1 1.8/2.6 > > Xiang Chen (2): > iommu/arm-smmu-v3: Implement the unmap_pages() IOMMU driver callback > iommu/arm-smmu-v3: Implement the map_pages() IOMMU driver callback For both patches: Acked-by: Will Deacon Joerg -- please can you take these directly? They build on top of the changes from Isaac which you have queued on your 'core' branch. Cheers, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: Check if group is NULL before remove device
On Sat, Jul 31, 2021 at 09:47:37AM +0200, Frank Wunderlich wrote: > Fixes: d72e31c93746 ("iommu: IOMMU Groups") > Signed-off-by: Frank Wunderlich > --- > v2: > - commit-message with captial letters on beginning of sentenence > - added more information, many thanks to Yong Wu Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface
On 2021-08-02 14:04, Will Deacon wrote: On Wed, Jul 28, 2021 at 04:58:44PM +0100, Robin Murphy wrote: To make io-pgtable aware of a flush queue being dynamically enabled, allow IO_PGTABLE_QUIRK_NON_STRICT to be set even after a domain has been attached to, and hook up the final piece of the puzzle in iommu-dma. Signed-off-by: Robin Murphy --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++ drivers/iommu/arm/arm-smmu/arm-smmu.c | 11 +++ drivers/iommu/dma-iommu.c | 3 +++ 3 files changed, 29 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 19400826eba7..40fa9cb382c3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2711,6 +2711,20 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain) return ret; } +static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain, + unsigned long quirks) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + if (quirks == IO_PGTABLE_QUIRK_NON_STRICT && smmu_domain->pgtbl_ops) { + struct io_pgtable *iop = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops); + + iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + return 0; + } + return -EINVAL; +} I don't see anything serialising this against a concurrent iommu_unmap(), so the ordering and atomicity looks quite suspicious to me here. I don't think it's just the page-table quirks either, but also setting cookie->fq_domain. Heh, I confess to very much taking the cheeky "let's say nothing and see what Will thinks about concurrency" approach here :) The beauty of only allowing relaxation in the strict->non-strict direction is that it shouldn't need serialising as such - it doesn't matter if the update to cookie->fq_domain is observed between iommu_unmap() and iommu_dma_free_iova(), since there's no correctness impact to queueing IOVAs which may already have been invalidated and may or may not have been synced. AFAICS the only condition which matters is that the setting of the io-pgtable quirk must observe fq_domain being set. It feels like there must be enough dependencies on the read side, but we might need an smp_wmb() between the two in iommu_dma_init_fq()? I've also flip-flopped a bit on whether fq_domain needs to be accessed with READ_ONCE/WRITE_ONCE - by the time of posting I'd convinced myself that it was probably OK, but looking again now I suppose this wacky reordering is theoretically possible: iommu_dma_unmap() { bool free_fq = cookie->fq_domain; // == false iommu_unmap(); if (!cookie->fq_domain) // observes new non-NULL value iommu_tlb_sync(); // skipped iommu_dma_free_iova { // inlined if (free_fq) // false queue_iova(); else free_iova_fast(); // Uh-oh! } } so although I still can't see atomicity being a problem I guess we do need it for the sake of reordering at least. Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
On 8/2/2021 9:20 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote: In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still need to handle. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V DMA ops callback will use swiotlb function to allocate bounce buffer and copy data from/to bounce buffer. I am wondering why you dont't use DMA-API unconditionally? It provides enough abstraction to do the right thing for isolated and legacy VMs. In VMbus, there is already a similar bounce buffer design and so there is no need to call DMA-API for such buffer. Calling DMA-API is to use swiotlb bounce buffer for those buffer which hasn't been covered. This is why need to conditionally call DMA API. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/4] dma-iommu: pass SKIP_CPU_SYNC to swiotlb unmap
On Fri, Jul 09, 2021 at 12:35:01PM +0900, David Stevens wrote: > From: David Stevens > > If SKIP_CPU_SYNC isn't already set, then iommu_dma_unmap_(page|sg) has > already called iommu_dma_sync_(single|sg)_for_cpu, so there is no need > to copy from the bounce buffer again. > > Signed-off-by: David Stevens > --- > drivers/iommu/dma-iommu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index e79e274d2dc5..0a9a9a343e64 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, > dma_addr_t dma_addr, > __iommu_dma_unmap(dev, dma_addr, size); > > if (unlikely(is_swiotlb_buffer(phys))) > - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); > + swiotlb_tbl_unmap_single(dev, phys, size, dir, > + attrs | DMA_ATTR_SKIP_CPU_SYNC); > } I think it would be cleaner to drop DMA_ATTR_SKIP_CPU_SYNC in the callers once they've called iommu_dma_sync_*_for_cpu(). Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] dma-iommu: fix arch_sync_dma for map with swiotlb
On Mon, Aug 02, 2021 at 02:40:59PM +0100, Will Deacon wrote: > On Fri, Jul 09, 2021 at 12:35:00PM +0900, David Stevens wrote: > > From: David Stevens > > > > When calling arch_sync_dma, we need to pass it the memory that's > > actually being used for dma. When using swiotlb bounce buffers, this is > > the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb > > helper, so it can use the bounce buffer address if necessary. This also > > means it is no longer necessary to call iommu_dma_sync_sg_for_device in > > iommu_dma_map_sg for untrusted devices. > > > > Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") > > Signed-off-by: David Stevens > > --- > > drivers/iommu/dma-iommu.c | 16 +++- > > 1 file changed, 7 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index eac65302439e..e79e274d2dc5 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -574,6 +574,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device > > *dev, phys_addr_t phys, > > memset(padding_start, 0, padding_size); > > } > > > > + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > > + arch_sync_dma_for_device(phys, org_size, dir); > > I think this relies on the swiotlb buffers residing in the linear mapping > (i.e. where phys_to_virt() is reliable), which doesn't look like a safe > assumption to me. No, sorry, ignore me here. I misread swiotlb_bounce(), so I think your change is good. As an aside, it strikes me that we'd probably be better off using uncacheable bounce buffers for non-coherent devices so we could avoid all this maintenance entirely. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] dma-iommu: fix arch_sync_dma for map with swiotlb
On Fri, Jul 09, 2021 at 12:35:00PM +0900, David Stevens wrote: > From: David Stevens > > When calling arch_sync_dma, we need to pass it the memory that's > actually being used for dma. When using swiotlb bounce buffers, this is > the bounce buffer. Move arch_sync_dma into the __iommu_dma_map_swiotlb > helper, so it can use the bounce buffer address if necessary. This also > means it is no longer necessary to call iommu_dma_sync_sg_for_device in > iommu_dma_map_sg for untrusted devices. > > Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") > Signed-off-by: David Stevens > --- > drivers/iommu/dma-iommu.c | 16 +++- > 1 file changed, 7 insertions(+), 9 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index eac65302439e..e79e274d2dc5 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -574,6 +574,9 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device > *dev, phys_addr_t phys, > memset(padding_start, 0, padding_size); > } > > + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + arch_sync_dma_for_device(phys, org_size, dir); I think this relies on the swiotlb buffers residing in the linear mapping (i.e. where phys_to_virt() is reliable), which doesn't look like a safe assumption to me. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM
On 8/2/2021 8:39 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote: + hv_ghcb->ghcb.protocol_version = 1; + hv_ghcb->ghcb.ghcb_usage = 1; The values set to ghcb_usage deserve some defines (here and below). OK. Will update in the next version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/4] dma-iommu: fix sync_sg with swiotlb
On Fri, Jul 09, 2021 at 12:34:59PM +0900, David Stevens wrote: > From: David Stevens > > The is_swiotlb_buffer function takes the physical address of the swiotlb > buffer, not the physical address of the original buffer. The sglist > contains the physical addresses of the original buffer, so for the > sync_sg functions to work properly when a bounce buffer might have been > used, we need to use iommu_iova_to_phys to look up the physical address. > This is what sync_single does, so call that function on each sglist > segment. > > The previous code mostly worked because swiotlb does the transfer on map > and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with > sglists or which call sync_sg would not have had anything copied to the > bounce buffer. > > Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") > Signed-off-by: David Stevens > --- > drivers/iommu/dma-iommu.c | 26 +- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7bcdd1205535..eac65302439e 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -811,14 +811,14 @@ static void iommu_dma_sync_sg_for_cpu(struct device > *dev, > if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) > return; > > - for_each_sg(sgl, sg, nelems, i) { > - if (!dev_is_dma_coherent(dev)) > - arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir); > - > - if (is_swiotlb_buffer(sg_phys(sg))) > + if (dev_is_untrusted(dev)) > + for_each_sg(sgl, sg, nelems, i) > + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg), > + sg->length, dir); > + else > + for_each_sg(sgl, sg, nelems, i) > swiotlb_sync_single_for_cpu(dev, sg_phys(sg), > sg->length, dir); Doesn't this skip arch_sync_dma_for_cpu() for non-coherent trusted devices? Why not skip the extra dev_is_untrusted(dev) call here and just call iommu_dma_sync_single_for_cpu() for each entry regardless? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support
On Mon, Aug 02, 2021 at 03:11:40PM +0200, Juergen Gross wrote: > As those cases are all mutually exclusive, wouldn't a static_call() be > the appropriate solution? Right, static_call() is even better, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver
On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote: > In Isolation VM, all shared memory with host needs to mark visible > to host via hvcall. vmbus_establish_gpadl() has already done it for > storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ > mpb_desc() still need to handle. Use DMA API to map/umap these > memory during sending/receiving packet and Hyper-V DMA ops callback > will use swiotlb function to allocate bounce buffer and copy data > from/to bounce buffer. I am wondering why you dont't use DMA-API unconditionally? It provides enough abstraction to do the right thing for isolated and legacy VMs. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page
On 8/2/2021 8:28 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote: +void hv_ghcb_msr_write(u64 msr, u64 value) +{ + union hv_ghcb *hv_ghcb; + void **ghcb_base; + unsigned long flags; + + if (!ms_hyperv.ghcb_base) + return; + + WARN_ON(in_nmi()); + + local_irq_save(flags); + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); + hv_ghcb = (union hv_ghcb *)*ghcb_base; + if (!hv_ghcb) { + local_irq_restore(flags); + return; + } + + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); Do you really need to zero out the whole 4k? The validation bitmap should be enough, there are no secrets on the page anyway. Same in hv_ghcb_msr_read(). OK. Thanks for suggestion. I will have a try. +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, + struct es_em_ctxt *ctxt, + u64 exit_code, u64 exit_info_1, + u64 exit_info_2) { enum es_result ret; @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, ghcb_set_sw_exit_info_1(ghcb, exit_info_1); ghcb_set_sw_exit_info_2(ghcb, exit_info_2); - sev_es_wr_ghcb_msr(__pa(ghcb)); + /* +* Hyper-V runs paravisor with SEV. Ghcb page is allocated by +* paravisor and not needs to be updated in the Linux guest. +* Otherwise, the ghcb page's PA reported by paravisor is above +* VTOM. Hyper-V use this function with NULL for ctxt point and +* skip setting ghcb page in such case. +*/ + if (ctxt) + sev_es_wr_ghcb_msr(__pa(ghcb)); No, do not make this function work with ctxt==NULL. Instead, factor out a helper function which contains what Hyper-V needs and use that in sev_es_ghcb_hv_call() and Hyper-V code. OK. Will update. +union hv_ghcb { + struct ghcb ghcb; +} __packed __aligned(PAGE_SIZE); I am curious what this will end up being good for. Hyper-V introduces a specific hypercall request in GHCB page and use same union in the Linux Hyper-V code to read/write MSR and call the new hypercall request. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support
On 02.08.21 14:01, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote: __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now Hyper-V are messing around in here. I was going to suggest a PV_OPS call where the fitting implementation for the guest environment can be plugged in at boot. There is TDX and an SEV(-SNP) case, a Hyper-V case, and likely more coming up from other cloud/hypervisor vendors. Hiding all these behind feature checks is not going to make things cleaner. As those cases are all mutually exclusive, wouldn't a static_call() be the appropriate solution? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
On 8/2/2021 8:59 PM, Joerg Roedel wrote: On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote: Both second and third are HV_GPADL_RING type. One is send ring and the other is receive ring. The driver keeps the order to allocate rx and tx buffer. You are right this is not robust and will add a mutex to keep the order. Or you introduce fixed indexes for the RX and TX buffers? The interface just allocates a buffer and driver will continue to configure the buffer to be rx or tx after calling. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface
On Wed, Jul 28, 2021 at 04:58:44PM +0100, Robin Murphy wrote: > To make io-pgtable aware of a flush queue being dynamically enabled, > allow IO_PGTABLE_QUIRK_NON_STRICT to be set even after a domain has been > attached to, and hook up the final piece of the puzzle in iommu-dma. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++ > drivers/iommu/arm/arm-smmu/arm-smmu.c | 11 +++ > drivers/iommu/dma-iommu.c | 3 +++ > 3 files changed, 29 insertions(+) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 19400826eba7..40fa9cb382c3 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -2711,6 +2711,20 @@ static int arm_smmu_enable_nesting(struct iommu_domain > *domain) > return ret; > } > > +static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain, > + unsigned long quirks) > +{ > + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + > + if (quirks == IO_PGTABLE_QUIRK_NON_STRICT && smmu_domain->pgtbl_ops) { > + struct io_pgtable *iop = > io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops); > + > + iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; > + return 0; > + } > + return -EINVAL; > +} I don't see anything serialising this against a concurrent iommu_unmap(), so the ordering and atomicity looks quite suspicious to me here. I don't think it's just the page-table quirks either, but also setting cookie->fq_domain. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support
On 8/2/2021 8:01 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote: __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now Hyper-V are messing around in here. I was going to suggest a PV_OPS call where the fitting implementation for the guest environment can be plugged in at boot. There is TDX and an SEV(-SNP) case, a Hyper-V case, and likely more coming up from other cloud/hypervisor vendors. Hiding all these behind feature checks is not going to make things cleaner. Yes, that makes sense. I will do this in the next version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote: > Both second and third are HV_GPADL_RING type. One is send ring and the > other is receive ring. The driver keeps the order to allocate rx and > tx buffer. You are right this is not robust and will add a mutex to keep > the order. Or you introduce fixed indexes for the RX and TX buffers? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message
On Wed, Jul 28, 2021 at 10:52:22AM -0400, Tianyu Lan wrote: > + if (hv_is_isolation_supported()) { > + vmbus_connection.monitor_pages_va[0] > + = vmbus_connection.monitor_pages[0]; > + vmbus_connection.monitor_pages[0] > + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE, > +MEMREMAP_WB); > + if (!vmbus_connection.monitor_pages[0]) > + return -ENOMEM; > + > + vmbus_connection.monitor_pages_va[1] > + = vmbus_connection.monitor_pages[1]; > + vmbus_connection.monitor_pages[1] > + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE, > +MEMREMAP_WB); > + if (!vmbus_connection.monitor_pages[1]) { > + memunmap(vmbus_connection.monitor_pages[0]); > + return -ENOMEM; > + } > + > + memset(vmbus_connection.monitor_pages[0], 0x00, > +HV_HYP_PAGE_SIZE); > + memset(vmbus_connection.monitor_pages[1], 0x00, > +HV_HYP_PAGE_SIZE); > + } Okay, let me see if I got this right. In Hyper-V Isolation VMs, when the guest wants to make memory shared, it does": - Call to the Hypervisor the mark the pages shared. The Hypervisor will do the RMP update and remap the pages at (VTOM + pa) - The guest maps the memory again into its page-table, so that the entries point to the correct GPA (which is above VTOM now). Or in other words, Hyper-V implements a hardware-independent and configurable c-bit position, as the VTOM value is always power-of-two aligned. Is that correct? This would at least explain why there is no separate allocation/dealloction of memory for the shared range. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()
Le 28/07/2021 à 00:26, Tom Lendacky a écrit : Replace occurrences of mem_encrypt_active() with calls to prot_guest_has() with the PATTR_MEM_ENCRYPT attribute. What about https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210730114231.23445-1-w...@kernel.org/ ? Christophe Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: David Airlie Cc: Daniel Vetter Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: VMware Graphics Cc: Joerg Roedel Cc: Will Deacon Cc: Dave Young Cc: Baoquan He Signed-off-by: Tom Lendacky --- arch/x86/kernel/head64.c| 4 ++-- arch/x86/mm/ioremap.c | 4 ++-- arch/x86/mm/mem_encrypt.c | 5 ++--- arch/x86/mm/pat/set_memory.c| 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++- drivers/gpu/drm/drm_cache.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++--- drivers/iommu/amd/iommu.c | 3 ++- drivers/iommu/amd/iommu_v2.c| 3 ++- drivers/iommu/iommu.c | 3 ++- fs/proc/vmcore.c| 6 +++--- kernel/dma/swiotlb.c| 4 ++-- 13 files changed, 29 insertions(+), 24 deletions(-) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index de01903c3735..cafed6456d45 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -19,7 +19,7 @@ #include #include #include -#include +#include #include #include @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long physaddr, * there is no need to zero it after changing the memory encryption * attribute. */ - if (mem_encrypt_active()) { + if (prot_guest_has(PATTR_MEM_ENCRYPT)) { vaddr = (unsigned long)__start_bss_decrypted; vaddr_end = (unsigned long)__end_bss_decrypted; for (; vaddr < vaddr_end; vaddr += PMD_SIZE) { diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 0f2d5ace5986..5e1c1f5cbbe8 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -693,7 +693,7 @@ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size, unsigned long flags) { - if (!mem_encrypt_active()) + if (!prot_guest_has(PATTR_MEM_ENCRYPT)) return true; if (flags & MEMREMAP_ENC) @@ -723,7 +723,7 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, { bool encrypted_prot; - if (!mem_encrypt_active()) + if (!prot_guest_has(PATTR_MEM_ENCRYPT)) return prot; encrypted_prot = true; diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 451de8e84fce..0f1533dbe81c 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size) /* * SME and SEV are very similar but they are not the same, so there are * times that the kernel will need to distinguish between SME and SEV. The - * sme_active() and sev_active() functions are used for this. When a - * distinction isn't needed, the mem_encrypt_active() function can be used. + * sme_active() and sev_active() functions are used for this. * * The trampoline code is a good example for this requirement. Before * paging is activated, SME will access all memory as decrypted, but SEV @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void) * The unused memory range was mapped decrypted, change the encryption * attribute from decrypted to encrypted before freeing it. */ - if (mem_encrypt_active()) { + if (sme_me_mask) { r = set_memory_encrypted(vaddr, npages); if (r) { pr_warn("failed to free unused decrypted pages\n"); diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index ad8a5c586a35..6925f2bb4be1 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -1986,7 +1987,7 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) int ret; /* Nothing to do if memory encryption is not active */ - if (!mem_encrypt_active()) + if (!prot_guest_has(PATTR_MEM_ENCRYPT)) return 0; /* Should not be working on unaligned addresses */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index abb928894eac..8407224717df 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -38,6 +38,7 @@ #i
Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
On 8/2/2021 8:07 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote: + if (type == HV_GPADL_BUFFER) + index = 0; + else + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1; Hmm... This doesn't look very robust. Can you set fixed indexes for different buffer types? HV_GPADL_BUFFER already has fixed index 0. But as it is implemented here you risk that index 2 gets overwritten by subsequent calls. Both second and third are HV_GPADL_RING type. One is send ring and the other is receive ring. The driver keeps the order to allocate rx and tx buffer. You are right this is not robust and will add a mutex to keep the order. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM
On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote: > + hv_ghcb->ghcb.protocol_version = 1; > + hv_ghcb->ghcb.ghcb_usage = 1; The values set to ghcb_usage deserve some defines (here and below). > + > + hv_ghcb->hypercall.outputgpa = (u64)output; > + hv_ghcb->hypercall.hypercallinput.asuint64 = 0; > + hv_ghcb->hypercall.hypercallinput.callcode = control; > + > + if (input_size) > + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size); > + > + VMGEXIT(); > + > + hv_ghcb->ghcb.ghcb_usage = 0x; ... > union hv_ghcb { > struct ghcb ghcb; > + struct { > + u64 hypercalldata[509]; > + u64 outputgpa; > + union { > + union { > + struct { > + u32 callcode: 16; > + u32 isfast : 1; > + u32 reserved1 : 14; > + u32 isnested: 1; > + u32 countofelements : 12; > + u32 reserved2 : 4; > + u32 repstartindex : 12; > + u32 reserved3 : 4; > + }; > + u64 asuint64; > + } hypercallinput; > + union { > + struct { > + u16 callstatus; > + u16 reserved1; > + u32 elementsprocessed : 12; > + u32 reserved2 : 20; > + }; > + u64 asunit64; > + } hypercalloutput; > + }; > + u64 reserved2; > + } hypercall; Okay, this answers my previous question :) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM
Hi Joerg: Thanks for your review. On 8/2/2021 7:53 PM, Joerg Roedel wrote: On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote: +static int hyperv_init_ghcb(void) +{ + u64 ghcb_gpa; + void *ghcb_va; + void **ghcb_base; + + if (!ms_hyperv.ghcb_base) + return -EINVAL; + + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa); + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB); This deserves a comment. As I understand it, the GHCB pa is set by Hyper-V or the paravisor, so the page does not need to be allocated by Linux. And it is not mapped unencrypted because the GHCB page is allocated above the VTOM boundary? You are right. The ghdb page is allocated by paravisor and its physical address is above VTOM boundary. Will add a comment to describe this. Thanks for suggestion. @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu) { struct hv_reenlightenment_control re_ctrl; unsigned int new_cpu; + unsigned long flags; + void **input_arg; + void *pg; + void **ghcb_va = NULL; + + local_irq_save(flags); + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); + pg = *input_arg; Pg is never used later on, why is it set? Sorry for noise. This should be removed during rebase and will fix in the next version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page
On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote: > +void hv_ghcb_msr_write(u64 msr, u64 value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + WARN_ON(in_nmi()); > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); Do you really need to zero out the whole 4k? The validation bitmap should be enough, there are no secrets on the page anyway. Same in hv_ghcb_msr_read(). > +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, > +struct es_em_ctxt *ctxt, > +u64 exit_code, u64 exit_info_1, > +u64 exit_info_2) > { > enum es_result ret; > > @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb > *ghcb, > ghcb_set_sw_exit_info_1(ghcb, exit_info_1); > ghcb_set_sw_exit_info_2(ghcb, exit_info_2); > > - sev_es_wr_ghcb_msr(__pa(ghcb)); > + /* > + * Hyper-V runs paravisor with SEV. Ghcb page is allocated by > + * paravisor and not needs to be updated in the Linux guest. > + * Otherwise, the ghcb page's PA reported by paravisor is above > + * VTOM. Hyper-V use this function with NULL for ctxt point and > + * skip setting ghcb page in such case. > + */ > + if (ctxt) > + sev_es_wr_ghcb_msr(__pa(ghcb)); No, do not make this function work with ctxt==NULL. Instead, factor out a helper function which contains what Hyper-V needs and use that in sev_es_ghcb_hv_call() and Hyper-V code. > +union hv_ghcb { > + struct ghcb ghcb; > +} __packed __aligned(PAGE_SIZE); I am curious what this will end up being good for. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM
On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote: > + if (type == HV_GPADL_BUFFER) > + index = 0; > + else > + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1; Hmm... This doesn't look very robust. Can you set fixed indexes for different buffer types? HV_GPADL_BUFFER already has fixed index 0. But as it is implemented here you risk that index 2 gets overwritten by subsequent calls. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support
On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote: > __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now > Hyper-V are messing around in here. I was going to suggest a PV_OPS call where the fitting implementation for the guest environment can be plugged in at boot. There is TDX and an SEV(-SNP) case, a Hyper-V case, and likely more coming up from other cloud/hypervisor vendors. Hiding all these behind feature checks is not going to make things cleaner. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM
On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote: > +static int hyperv_init_ghcb(void) > +{ > + u64 ghcb_gpa; > + void *ghcb_va; > + void **ghcb_base; > + > + if (!ms_hyperv.ghcb_base) > + return -EINVAL; > + > + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa); > + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB); This deserves a comment. As I understand it, the GHCB pa is set by Hyper-V or the paravisor, so the page does not need to be allocated by Linux. And it is not mapped unencrypted because the GHCB page is allocated above the VTOM boundary? > @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu) > { > struct hv_reenlightenment_control re_ctrl; > unsigned int new_cpu; > + unsigned long flags; > + void **input_arg; > + void *pg; > + void **ghcb_va = NULL; > + > + local_irq_save(flags); > + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); > + pg = *input_arg; Pg is never used later on, why is it set? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()
Will Deacon writes: > Commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()") > introduced a set_memory_encrypted() call to swiotlb_exit() so that the > buffer pages are returned to an encrypted state prior to being freed. > > Sachin reports that this leads to the following crash on a Power server: > > [0.010799] software IO TLB: tearing down default memory pool > [0.010805] [ cut here ] > [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98! > > Nick spotted that this is because set_memory_encrypted() is issuing an > ultracall which doesn't exist for the processor, and should therefore > be gated by mem_encrypt_active() to mirror the x86 implementation. > > Cc: Konrad Rzeszutek Wilk > Cc: Claire Chang > Cc: Christoph Hellwig > Cc: Robin Murphy > Fixes: ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()") > Suggested-by: Nicholas Piggin > Reported-by: Sachin Sant > Tested-by: Sachin Sant > Tested-by: Nathan Chancellor > Link: > https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/ > Signed-off-by: Will Deacon > --- > arch/powerpc/platforms/pseries/svm.c | 6 ++ > 1 file changed, 6 insertions(+) Thanks. Acked-by: Michael Ellerman I assume Konrad will take this via the swiotlb tree? cheers ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote: > On 2021-07-28 19:30, Georgi Djakov wrote: > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote: > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag") > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went > > > the memory type setting required for the non-coherent masters to use > > > system cache. Now that system cache support for GPU is added, we will > > > need to set the right PTE attribute for GPU buffers to be sys cached. > > > Without this, the system cache lines are not allocated for GPU. > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC, > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC > > > and makes GPU the user of this protection flag. > > > > Thank you for the patchset! Are you planning to refresh it, as it does > > not apply anymore? > > > > I was waiting on Will's reply [1]. If there are no changes needed, then > I can repost the patch. I still think you need to handle the mismatched alias, no? You're adding a new memory type to the SMMU which doesn't exist on the CPU side. That can't be right. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 08/11] mm: Remove the now unused mem_encrypt_active() function
On Tue, Jul 27, 2021 at 05:26:11PM -0500, Tom Lendacky wrote: > The mem_encrypt_active() function has been replaced by prot_guest_has(), > so remove the implementation. > > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/11] x86/sev: Remove the now unused mem_encrypt_active() function
On Tue, Jul 27, 2021 at 05:26:12PM -0500, Tom Lendacky wrote: > The mem_encrypt_active() function has been replaced by prot_guest_has(), > so remove the implementation. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote: > @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct > trampoline_header *th) > if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT)) > th->flags |= TH_FLAGS_SME_ACTIVE; > > - if (sev_es_active()) { > + if (prot_guest_has(PATTR_GUEST_PROT_STATE)) { > /* >* Skip the call to verify_cpu() in secondary_startup_64 as it >* will cause #VC exceptions when the AP can't handle them yet. Not sure how TDX will handle AP booting, are you sure it needs this special setup as well? Otherwise a check for SEV-ES would be better instead of the generic PATTR_GUEST_PROT_STATE. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/11] x86/sev: Replace occurrences of sev_active() with prot_guest_has()
On Tue, Jul 27, 2021 at 05:26:08PM -0500, Tom Lendacky wrote: > Replace occurrences of sev_active() with the more generic prot_guest_has() > using PATTR_GUEST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c > where PATTR_SEV will be used. If future support is added for other memory > encryption technologies, the use of PATTR_GUEST_MEM_ENCRYPT can be > updated, as required, to use PATTR_SEV. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Ard Biesheuvel > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/11] x86/sme: Replace occurrences of sme_active() with prot_guest_has()
On Tue, Jul 27, 2021 at 05:26:07PM -0500, Tom Lendacky wrote: > Replace occurrences of sme_active() with the more generic prot_guest_has() > using PATTR_HOST_MEM_ENCRYPT, except for in arch/x86/mm/mem_encrypt*.c > where PATTR_SME will be used. If future support is added for other memory > encryption technologies, the use of PATTR_HOST_MEM_ENCRYPT can be > updated, as required, to use PATTR_SME. > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Joerg Roedel > Cc: Will Deacon > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky wrote: > Introduce an x86 version of the prot_guest_has() function. This will be > used in the more generic x86 code to replace vendor specific calls like > sev_active(), etc. > > While the name suggests this is intended mainly for guests, it will > also be used for host memory encryption checks in place of sme_active(). > > The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the > same reasons previously stated when changing sme_active(), sev_active and > sme_me_mask to EXPORT_SYBMOL: > commit 87df26175e67 ("x86/mm: Unbreak modules that rely on external > PAGE_KERNEL availability") > commit 9d5f38ba6c82 ("x86/mm: Unbreak modules that use the DMA API") > > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Co-developed-by: Andi Kleen > Signed-off-by: Andi Kleen > Co-developed-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky wrote: > In prep for other protected virtualization technologies, introduce a > generic helper function, prot_guest_has(), that can be used to check > for specific protection attributes, like memory encryption. This is > intended to eliminate having to add multiple technology-specific checks > to the code (e.g. if (sev_active() || tdx_active())). > > Co-developed-by: Andi Kleen > Signed-off-by: Andi Kleen > Co-developed-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Tom Lendacky Reviewed-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 13/21] s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
On Fri, 2021-07-23 at 11:50 -0600, Logan Gunthorpe wrote: > Setting the ->dma_address to DMA_MAPPING_ERROR is not part of > the ->map_sg calling convention, so remove it. > > Link: https://lore.kernel.org/linux-mips/20210716063241.gc13...@lst.de/ > Suggested-by: Christoph Hellwig > Signed-off-by: Logan Gunthorpe > Cc: Niklas Schnelle > Cc: Gerald Schaefer > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > --- > arch/s390/pci/pci_dma.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c > index c78b02012764..be48e5b5bfcf 100644 > --- a/arch/s390/pci/pci_dma.c > +++ b/arch/s390/pci/pci_dma.c > @@ -492,7 +492,6 @@ static int s390_dma_map_sg(struct device *dev, struct > scatterlist *sg, > for (i = 1; i < nr_elements; i++) { > s = sg_next(s); > > - s->dma_address = DMA_MAPPING_ERROR; > s->dma_length = 0; > > if (s->offset || (size & ~PAGE_MASK) || Acked-by: Niklas Schnelle Thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/12] Clean up "mediatek,larb"
On Fri, Jul 30, 2021 at 10:52:26AM +0800, Yong Wu wrote: > .../display/mediatek/mediatek,disp.txt| 9 > .../bindings/media/mediatek-jpeg-decoder.yaml | 9 > .../bindings/media/mediatek-jpeg-encoder.yaml | 9 > .../bindings/media/mediatek-mdp.txt | 8 > .../bindings/media/mediatek-vcodec.txt| 4 -- > arch/arm/boot/dts/mt2701.dtsi | 2 - > arch/arm/boot/dts/mt7623n.dtsi| 5 -- > arch/arm64/boot/dts/mediatek/mt8173.dtsi | 16 --- > arch/arm64/boot/dts/mediatek/mt8183.dtsi | 6 --- > drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 9 +++- > drivers/gpu/drm/mediatek/mtk_disp_rdma.c | 9 +++- > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 15 +++--- > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 36 +-- > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 1 - > drivers/gpu/drm/mediatek/mtk_drm_drv.c| 5 +- > drivers/iommu/mtk_iommu.c | 24 +- > drivers/iommu/mtk_iommu_v1.c | 31 - > .../media/platform/mtk-jpeg/mtk_jpeg_core.c | 45 +- > .../media/platform/mtk-jpeg/mtk_jpeg_core.h | 2 - > drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 46 +-- > drivers/media/platform/mtk-mdp/mtk_mdp_comp.h | 2 - > drivers/media/platform/mtk-mdp/mtk_mdp_core.c | 1 - > .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c | 37 ++- > .../platform/mtk-vcodec/mtk_vcodec_drv.h | 3 -- > .../platform/mtk-vcodec/mtk_vcodec_enc.c | 1 - > .../platform/mtk-vcodec/mtk_vcodec_enc_pm.c | 44 ++ > drivers/memory/mtk-smi.c | 14 -- > include/soc/mediatek/smi.h| 20 > 28 files changed, 92 insertions(+), 321 deletions(-) So this is likely not going through the IOMMU tree, given Matthias has reviewed the IOMMU changes you can add my Acked-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 0/7] iommu/amd: Enable page-selective flushes
On Fri, Jul 23, 2021 at 02:32:02AM -0700, Nadav Amit wrote: > Nadav Amit (6): > iommu/amd: Selective flush on unmap > iommu/amd: Do not use flush-queue when NpCache is on > iommu: Factor iommu_iotlb_gather_is_disjoint() out > iommu/amd: Tailored gather logic for AMD > iommu/amd: Sync once for scatter-gather operations > iommu/amd: Use only natural aligned flushes in a VM > > Robin Murphy (1): > iommu: Improve iommu_iotlb_gather helpers Applied, thanks Nadav. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 0/9] ACPI/IORT: Support for IORT RMR node
On Tue, Jul 27, 2021 at 06:51:56AM +, Shameerali Kolothum Thodi wrote: > A gentle ping on this... This needs more reviews, and please add Will Deacon when you post the next version of this patch-set. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu