[PATCH AUTOSEL 5.4 13/46] dma-mapping: Fix dma_pgprot() for unencrypted coherent pages
From: Thomas Hellstrom [ Upstream commit 17c4a2ae15a7aaefe84bdb271952678c5c9cd8e1 ] When dma_mmap_coherent() sets up a mapping to unencrypted coherent memory under SEV encryption and sometimes under SME encryption, it will actually set up an encrypted mapping rather than an unencrypted, causing devices that DMAs from that memory to read encrypted contents. Fix this. When force_dma_unencrypted() returns true, the linear kernel map of the coherent pages have had the encryption bit explicitly cleared and the page content is unencrypted. Make sure that any additional PTEs we set up to these pages also have the encryption bit cleared by having dma_pgprot() return a protection with the encryption bit cleared in this case. Signed-off-by: Thomas Hellstrom Signed-off-by: Borislav Petkov Reviewed-by: Christoph Hellwig Acked-by: Tom Lendacky Link: https://lkml.kernel.org/r/20200304114527.3636-3-thomas...@shipmail.org Signed-off-by: Sasha Levin --- kernel/dma/mapping.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index d9334f31a5afb..8682a5305cb36 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -169,6 +169,8 @@ EXPORT_SYMBOL(dma_get_sgtable_attrs); */ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs) { + if (force_dma_unencrypted(dev)) + prot = pgprot_decrypted(prot); if (dev_is_dma_coherent(dev) || (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) && (attrs & DMA_ATTR_NON_CONSISTENT))) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
Hi Eric, > From: Auger Eric > Sent: Friday, April 10, 2020 11:28 AM > To: Liu, Yi L ; Jean-Philippe Brucker Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > Hi Yi, > > On 4/9/20 2:47 PM, Liu, Yi L wrote: > > Hi Jean, > > > >> From: Jean-Philippe Brucker > >> Sent: Thursday, April 9, 2020 4:15 PM > >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 > >> format to userspace > >> > >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > >>> Hi Yi, > >>> > >>> On 4/7/20 11:43 AM, Liu, Yi L wrote: > Hi Jean, > > > From: Jean-Philippe Brucker > > Sent: Friday, April 3, 2020 4:23 PM > > To: Auger Eric userspace > > > > On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > > >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > > @@ -2254,6 +2309,7 > > @@ static int vfio_iommu_info_add_nesting_cap(struct > vfio_iommu *iommu, > > /* nesting iommu type supports PASID requests > (alloc/free) > >> */ > > nesting_cap->nesting_capabilities |= > >> VFIO_IOMMU_PASID_REQS; > What is the meaning for ARM? > >>> > >>> I think it's just a software capability exposed to userspace, on > >>> userspace side, it has a choice to use it or not. :-) The reason > >>> define it and report it in cap nesting is that I'd like to make > >>> the pasid alloc/free be available just for IOMMU with type > >>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not > >>> good for ARM. We can find a proper way to report the availability. > >> > >> Well it is more a question for jean-Philippe. Do we have a system > >> wide PASID allocation on ARM? > > > > We don't, the PASID spaces are per-VM on Arm, so this function > > should consult > >> the > > IOMMU driver before setting flags. As you said on patch 3, nested > > doesn't necessarily imply PASID support. The SMMUv2 does not > > support PASID but does support nesting stages 1 and 2 for the IOVA > > space. > > SMMUv3 support of PASID depends on HW capabilities. So I think > > this needs to > >> be > > finer grained: > > > > Does the container support: > > * VFIO_IOMMU_PASID_REQUEST? > > -> Yes for VT-d 3 > > -> No for Arm SMMU > > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > > -> Yes for VT-d 3 > > -> Sometimes for SMMUv2 > > -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler > due to > > PASID tables being in GPA space.) > > * VFIO_IOMMU_BIND_PASID_TABLE? > > -> No for VT-d > > -> Sometimes for SMMUv3 > > > > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > good summary. do you expect to see any > > > > > + nesting_cap->stage1_formats = formats; > as spotted by Kevin, since a single format is supported, rename > >>> > >>> ok, I was believing it may be possible on ARM or so. :-) will > >>> rename it. > > > > Yes I don't think an u32 is going to cut it for Arm :( We need to > > describe all sorts > >> of > > capabilities for page and PASID tables (granules, GPA size, > > ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm > > stage-1 format" wouldn't mean much. I guess we could have a secondary > vendor capability for these? > > Actually, I'm wondering if we can define some formats to stands for > a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may > indicates the 1st level page table related caps (aw, a/d, SRE, EA > and etc.). And vIOMMU can parse the capabilities. > >>> > >>> But eventually do we really need all those capability getters? I > >>> mean can't we simply rely on the actual call to > >>> VFIO_IOMMU_BIND_GUEST_PGTBL() to detect any mismatch? Definitively > >>> the error handling may be heavier on userspace but can't we manage. > >> > >> I think we need to present these capabilities at boot time, long > >> before the guest triggers a bind(). For example if the host SMMU > >> doesn't support 16-bit ASID, we need to communicate that to the guest > >> using vSMMU ID registers or PROBE properties. Otherwise a bind() will > >> succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will > >> result in C_BAD_CD events which we'll inject into the guest, for no > >> apparent reason from their perspective. > >> > >> In addition some VMMs may have fallbacks if shared page tables are > >> not available. They could fall back to a MAP/UNMAP interface, or > >> simply not present a vIOMMU to the guest. > >> > > > > Based on the comments, I think it would be a need to report iommu caps > > in detail. So I guess iommu uapi needs to provide something alike vfio > > cap chain in iommu uapi. Please feel free let me know your thoughts. > > :-) > >
Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
Hi Yi, On 4/9/20 2:47 PM, Liu, Yi L wrote: > Hi Jean, > >> From: Jean-Philippe Brucker >> Sent: Thursday, April 9, 2020 4:15 PM >> Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to >> userspace >> >> On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: >>> Hi Yi, >>> >>> On 4/7/20 11:43 AM, Liu, Yi L wrote: Hi Jean, > From: Jean-Philippe Brucker > Sent: Friday, April 3, 2020 4:23 PM > To: Auger Eric > userspace > > On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > @@ -2254,6 +2309,7 > @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, > /* nesting iommu type supports PASID requests > (alloc/free) >> */ > nesting_cap->nesting_capabilities |= >> VFIO_IOMMU_PASID_REQS; What is the meaning for ARM? >>> >>> I think it's just a software capability exposed to userspace, on >>> userspace side, it has a choice to use it or not. :-) The reason >>> define it and report it in cap nesting is that I'd like to make the >>> pasid alloc/free be available just for IOMMU with type >>> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good >>> for ARM. We can find a proper way to report the availability. >> >> Well it is more a question for jean-Philippe. Do we have a system wide >> PASID allocation on ARM? > > We don't, the PASID spaces are per-VM on Arm, so this function should > consult >> the > IOMMU driver before setting flags. As you said on patch 3, nested doesn't > necessarily imply PASID support. The SMMUv2 does not support PASID but > does > support nesting stages 1 and 2 for the IOVA space. > SMMUv3 support of PASID depends on HW capabilities. So I think this needs > to >> be > finer grained: > > Does the container support: > * VFIO_IOMMU_PASID_REQUEST? > -> Yes for VT-d 3 > -> No for Arm SMMU > * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > -> Yes for VT-d 3 > -> Sometimes for SMMUv2 > -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due > to > PASID tables being in GPA space.) > * VFIO_IOMMU_BIND_PASID_TABLE? > -> No for VT-d > -> Sometimes for SMMUv3 > > Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. good summary. do you expect to see any > > + nesting_cap->stage1_formats = formats; as spotted by Kevin, since a single format is supported, rename >>> >>> ok, I was believing it may be possible on ARM or so. :-) will rename >>> it. > > Yes I don't think an u32 is going to cut it for Arm :( We need to > describe all sorts >> of > capabilities for page and PASID tables (granules, GPA size, ASID/PASID > size, HW > access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean > much. I > guess we could have a secondary vendor capability for these? Actually, I'm wondering if we can define some formats to stands for a set of capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse the capabilities. >>> >>> But eventually do we really need all those capability getters? I mean >>> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() >>> to detect any mismatch? Definitively the error handling may be heavier >>> on userspace but can't we manage. >> >> I think we need to present these capabilities at boot time, long before >> the guest triggers a bind(). For example if the host SMMU doesn't support >> 16-bit ASID, we need to communicate that to the guest using vSMMU ID >> registers or PROBE properties. Otherwise a bind() will succeed, but if the >> guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events >> which we'll inject into the guest, for no apparent reason from their >> perspective. >> >> In addition some VMMs may have fallbacks if shared page tables are not >> available. They could fall back to a MAP/UNMAP interface, or simply not >> present a vIOMMU to the guest. >> > > Based on the comments, I think it would be a need to report iommu caps > in detail. So I guess iommu uapi needs to provide something alike vfio > cap chain in iommu uapi. Please feel free let me know your thoughts. :-) Yes to me it sounds sensible. > > In vfio, we can define a cap as below: > > struct vfio_iommu_type1_info_cap_nesting { > struct vfio_info_cap_header header; > __u64 iommu_model; > #define VFIO_IOMMU_PASID_REQS (1 << 0) I still think the name shall be changed > #define VFIO_IOMMU_BIND_GPASID(1 << 1) > #define VFIO_IOMMU_CACHE_INV (1
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On (20/04/09 10:08), Minchan Kim wrote: > > > Even though I don't know how many usecase we have using zsmalloc as > > > module(I heard only once by dumb reason), it could affect existing > > > users. Thus, please include concrete explanation in the patch to > > > justify when the complain occurs. > > > > The justification is 'we can unexport functions that have no sane reason > > of being exported in the first place'. > > > > The Changelog pretty much says that. > > Okay, I hope there is no affected user since this patch. > If there are someone, they need to provide sane reason why they want > to have zsmalloc as module. I'm one of those who use zsmalloc as a module - mainly because I use zram as a compressing general purpose block device, not as a swap device. I create zram0, mkfs, mount, checkout and compile code, once done - umount, rmmod. This reduces the number of writes to SSD. Some people use tmpfs, but zram device(-s) can be much larger in size. That's a niche use case and I'm not against the patch. -ss ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: use DMA domain for real DMA devices and subdevices
Hi, On 2020/4/10 3:17, Jon Derrick wrote: The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier: if the VMD device (and hence subdevices too) are under IOMMU_DOMAIN_IDENTITY, mappings do not work. Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an iommu DMA mapping, and fall back on dma_direct_map_page() for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken when intel_page_page() handles a subdevice. We observe that at iommu attach time, dmar_insert_one_dev_info() for the subdevices will never set dev->archdata.iommu. This is because that function Do you mind telling why not setting this? uses find_domain() to check if there is already an IOMMU for the device, and find_domain() then defers to the real DMA device which does have one. Thus dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. Then, later: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. identity_mapping() returns false because dev->archdata.iommu is NULL, so this function returns false indicating that mapping is needed. 2. __intel_map_single() is called to create the mapping. 3. __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; This is caused by the fragile private domain implementation. We are in process of removing it by enhancing the iommu subsystem with per-group default domain. https://www.spinics.net/lists/iommu/msg42976.html So ultimately VMD subdevices should have their own per-device iommu data and support per-device dma ops. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 3/5] drm/msm: Attach the IOMMU device during initialization
Everywhere an IOMMU object is created by msm_gpu_create_address_space the IOMMU device is attached immediately after. Instead of carrying around the infrastructure to do the attach from the device specific code do it directly in the msm_iommu_init() function. This gets it out of the way for more aggressive cleanups that follow. Reviewed-by: Rob Clark Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 4 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 7 --- drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++ drivers/gpu/drm/msm/msm_gpu.c| 11 +-- drivers/gpu/drm/msm/msm_gpummu.c | 6 -- drivers/gpu/drm/msm/msm_iommu.c | 15 +++ drivers/gpu/drm/msm/msm_mmu.h| 1 - 8 files changed, 27 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c index ce19f1d39367..6629a142574e 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c @@ -772,7 +772,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms) { struct iommu_domain *domain; struct msm_gem_address_space *aspace; - int ret; domain = iommu_domain_alloc(_bus_type); if (!domain) @@ -788,13 +787,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms) return PTR_ERR(aspace); } - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) { - DPU_ERROR("failed to attach iommu %d\n", ret); - msm_gem_address_space_put(aspace); - return ret; - } - dpu_kms->base.aspace = aspace; return 0; } diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c index dda05436f716..9dba37c6344f 100644 --- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c @@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev) } kms->aspace = aspace; - - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) - goto fail; } else { DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys " "contig buffers for scanout\n"); diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index 47b989834af1..1e9ba99fd9eb 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -644,13 +644,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev) } kms->aspace = aspace; - - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) { - DRM_DEV_ERROR(>dev, "failed to attach iommu: %d\n", - ret); - goto fail; - } } else { DRM_DEV_INFO(>dev, "no iommu, fallback to phys contig buffers for scanout\n"); diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c index 1af5354bcd46..91d993a16850 100644 --- a/drivers/gpu/drm/msm/msm_gem_vma.c +++ b/drivers/gpu/drm/msm/msm_gem_vma.c @@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, const char *name) { struct msm_gem_address_space *aspace; - u64 size = domain->geometry.aperture_end - - domain->geometry.aperture_start; + u64 start = domain->geometry.aperture_start; + u64 size = domain->geometry.aperture_end - start; aspace = kzalloc(sizeof(*aspace), GFP_KERNEL); if (!aspace) @@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, spin_lock_init(>lock); aspace->name = name; aspace->mmu = msm_iommu_new(dev, domain); + if (IS_ERR(aspace->mmu)) { + int ret = PTR_ERR(aspace->mmu); - drm_mm_init(>mm, (domain->geometry.aperture_start >> PAGE_SHIFT), - size >> PAGE_SHIFT); + kfree(aspace); + return ERR_PTR(ret); + } + + /* +* Attaching the IOMMU device changes the aperture values so use the +* cached values instead +*/ + drm_mm_init(>mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT); kref_init(>kref); @@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, struct msm_gpu *gpu, spin_lock_init(>lock); aspace->name = name; aspace->mmu = msm_gpummu_new(dev, gpu); + if (IS_ERR(aspace->mmu)) { + int ret = PTR_ERR(aspace->mmu); + + kfree(aspace); + return ERR_PTR(ret); + } drm_mm_init(>mm, (va_start >>
[PATCH v6 2/5] iommu/arm-smmu: Add support for TTBR1
Add support to enable TTBR1 if the domain requests it via the DOMAIN_ATTR_SPLIT_TABLES attribute. If enabled by the hardware and pagetable configuration the driver will configure the TTBR1 region and program the domain pagetable on TTBR1. TTBR0 will be disabled. After attaching the device the value of he domain attribute can be queried to see if the split pagetables were successfully programmed. The domain geometry will be updated as well so that the caller can determine the active region for the pagetable that was programmed. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu.c | 48 ++-- drivers/iommu/arm-smmu.h | 24 +++- 2 files changed, 59 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a6a5796e9c41..db6d503c1673 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -555,11 +555,16 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; cb->ttbr[1] = 0; } else { - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; - cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, - cfg->asid); - cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, -cfg->asid); + cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, + cfg->asid); + + if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + } else { + cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, +cfg->asid); + } } } else { cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; @@ -673,6 +678,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, enum io_pgtable_fmt fmt; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_cfg *cfg = _domain->cfg; + unsigned long quirks = 0; mutex_lock(_domain->init_mutex); if (smmu_domain->smmu) @@ -741,6 +747,14 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, oas = smmu->ipa_size; if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) { fmt = ARM_64_LPAE_S1; + + /* +* We are assuming that split pagetables will always use +* SEP_UPSTREAM so we don't need to reduce the size of +* the ias to account for the sign extension bit +*/ + if (smmu_domain->split_pagetables) + quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; } else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) { fmt = ARM_32_LPAE_S1; ias = min(ias, 32UL); @@ -810,6 +824,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK, .tlb= smmu_domain->flush_ops, .iommu_dev = smmu->dev, + .quirks = quirks, }; if (smmu_domain->non_strict) @@ -823,8 +838,16 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, /* Update the domain's page sizes to reflect the page table format */ domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << ias) - 1; - domain->geometry.force_aperture = true; + + if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) { + domain->geometry.aperture_start = ~0UL << ias; + domain->geometry.aperture_end = ~0UL; + domain->geometry.force_aperture = true; + } else { + domain->geometry.aperture_end = (1UL << ias) - 1; + domain->geometry.force_aperture = true; + smmu_domain->split_pagetables = false; + } /* Initialise the context bank with our page table cfg */ arm_smmu_init_context_bank(smmu_domain, _cfg); @@ -1526,6 +1549,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_SPLIT_TABLES: + *(int *)data = smmu_domain->split_pagetables; + return 0; default:
[PATCH v6 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2
This is another iteration for the split pagetable support based on the suggestions from Robin and Will [1]. Background: In order to support per-context pagetables the GPU needs to enable split tables so that we can store global buffers in the TTBR1 space leaving the GPU free to program the TTBR0 register with the address of a context specific pagetable. If the DOMAIN_ATTR_SPLIT_TABLES attribute is set on the domain before attaching, the context bank assigned to the domain will be programmed to allow translations in the TTBR1 space. Translations in the TTBR0 region will be disallowed because, as Robin pointe out, having a un-programmed TTBR0 register is dangerous. The driver can determine if TTBR1 was successfully programmed by querying DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be updated to reflect the virtual address space for the TTBR1 range. Upcoming changes will allow auxiliary domains to be attached to the device which will enable and program TTBR0. This patchset is based on top of linux-next-20200409 Change log: v6: Cleanups for the arm-smmu TTBR1 patch from Will Deacon v4: Only program TTBR1 when split pagetables are requested. TTBR0 will be enabled later when an auxiliary domain is attached v3: Remove the implementation specific and make split pagetable support part of the generic configuration [1] https://lists.linuxfoundation.org/pipermail/iommu/2020-January/041373.html Jordan Crouse (5): iommu: Add DOMAIN_ATTR_SPLIT_TABLES iommu/arm-smmu: Add support for TTBR1 drm/msm: Attach the IOMMU device during initialization drm/msm: Refactor address space initialization drm/msm/a6xx: Support split pagetables drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 drivers/gpu/drm/msm/adreno/a3xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 --- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 18 +++-- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 - drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 -- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 - drivers/gpu/drm/msm/msm_drv.h| 8 +--- drivers/gpu/drm/msm/msm_gem_vma.c| 36 +++-- drivers/gpu/drm/msm/msm_gpu.c| 49 +-- drivers/gpu/drm/msm/msm_gpu.h| 4 +- drivers/gpu/drm/msm/msm_gpummu.c | 6 --- drivers/gpu/drm/msm/msm_iommu.c | 18 + drivers/gpu/drm/msm/msm_mmu.h| 1 - drivers/iommu/arm-smmu.c | 48 ++ drivers/iommu/arm-smmu.h | 24 --- include/linux/iommu.h| 2 + 21 files changed, 200 insertions(+), 155 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 5/5] drm/msm/a6xx: Support split pagetables
Attempt to enable split pagetables if the arm-smmu driver supports it. This will move the default address space from the default region to the address range assigned to TTBR1. The behavior should be transparent to the driver for now but it gets the default buffers out of the way when we want to start swapping TTBR0 for context-specific pagetables. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 02ade43d6335..b27daa77723c 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -825,6 +825,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) return (unsigned long)busy_time; } +static struct msm_gem_address_space * +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct iommu_domain *iommu = iommu_domain_alloc(_bus_type); + struct msm_gem_address_space *aspace; + struct msm_mmu *mmu; + u64 start, size; + u32 val = 1; + int ret; + + if (!iommu) + return ERR_PTR(-ENOMEM); + + /* +* Try to request split pagetables - the request has to be made before +* the domian is attached +*/ + iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, ); + + mmu = msm_iommu_new(>dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } + + /* +* After the domain is attached, see if the split tables were actually +* successful. +*/ + ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, ); + if (!ret && val) { + /* +* The aperture start will be at the beginning of the TTBR1 +* space so use that as a base +*/ + start = iommu->geometry.aperture_start; + size = 0x; + } else { + /* Otherwise use the legacy 32 bit region */ + start = SZ_16M; + size = 0x - SZ_16M; + } + + aspace = msm_gem_address_space_create(mmu, "gpu", start, size); + if (IS_ERR(aspace)) + iommu_domain_free(iommu); + + return aspace; +} + static const struct adreno_gpu_funcs funcs = { .base = { .get_param = adreno_get_param, @@ -847,7 +897,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, #endif - .create_address_space = adreno_iommu_create_address_space, + .create_address_space = a6xx_create_address_space, }, .get_timestamp = a6xx_get_timestamp, }; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES
Add a new attribute to enable and query the state of split pagetables for the domain. Acked-by: Will Deacon Signed-off-by: Jordan Crouse --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ef8b0bda695..d0f96f748a00 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -126,6 +126,8 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + /* Enable split pagetables (for example, TTBR1 on arm-smmu) */ + DOMAIN_ATTR_SPLIT_TABLES, DOMAIN_ATTR_MAX, }; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 4/5] drm/msm: Refactor address space initialization
Refactor how address space initialization works. Instead of having the address space function create the MMU object (and thus require separate but equal functions for gpummu and iommu) use a single function and pass the MMU struct in. Make the generic code cleaner by using target specific functions to create the address space so a2xx can do its own thing in its own space. For all the other targets use a generic helper to initialize IOMMU but leave the door open for newer targets to use customization if they need it. Reviewed-by: Rob Clark Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 drivers/gpu/drm/msm/adreno/a3xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 --- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 10 ++--- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 --- drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 -- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 - drivers/gpu/drm/msm/msm_drv.h| 8 +--- drivers/gpu/drm/msm/msm_gem_vma.c| 51 +++- drivers/gpu/drm/msm/msm_gpu.c| 40 +-- drivers/gpu/drm/msm/msm_gpu.h| 4 +- drivers/gpu/drm/msm/msm_iommu.c | 3 ++ 16 files changed, 82 insertions(+), 114 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index 1f83bc18d500..60f6472a3e58 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct msm_gpu *gpu) return state; } +static struct msm_gem_address_space * +a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct msm_mmu *mmu = msm_gpummu_new(>dev, gpu); + struct msm_gem_address_space *aspace; + + aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M, + SZ_16M + 0xfff * SZ_64K); + + if (IS_ERR(aspace) && !IS_ERR(mmu)) + mmu->funcs->destroy(mmu); + + return aspace; +} + /* Register offset defines for A2XX - copy of A3XX */ static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = { REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE), @@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = a2xx_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index b67f88872726..0a5ea9f56cb8 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a3xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index 253d8d85daad..b626afb0627d 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a4xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, .get_timestamp = a4xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index 724024a2243a..e81b1deaf535 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1439,6 +1439,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_busy = a5xx_gpu_busy, .gpu_state_get = a5xx_gpu_state_get, .gpu_state_put = a5xx_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, .get_timestamp = a5xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 68af24150de5..02ade43d6335 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -847,6 +847,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, #endif + .create_address_space = adreno_iommu_create_address_space, }, .get_timestamp = a6xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
Re: [PATCH 0/2] iommu/arm-smmu: Allow client devices to select direct mapping
On Tue, Feb 04, 2020 at 11:12:17PM +0530, Sai Prakash Ranjan wrote: > Hello Robin, Will > > On 2020-01-22 17:18, Sai Prakash Ranjan wrote: > > This series allows drm devices to set a default identity > > mapping using iommu_request_dm_for_dev(). First patch is > > a cleanup to support other SoCs to call into QCOM specific > > implementation and preparation for second patch. > > Second patch sets the default identity domain for drm devices. > > > > Jordan Crouse (1): > > iommu/arm-smmu: Allow client devices to select direct mapping > > > > Sai Prakash Ranjan (1): > > iommu: arm-smmu-impl: Convert to a generic reset implementation > > > > drivers/iommu/arm-smmu-impl.c | 8 +++-- > > drivers/iommu/arm-smmu-qcom.c | 55 +-- > > drivers/iommu/arm-smmu.c | 3 ++ > > drivers/iommu/arm-smmu.h | 5 > > 4 files changed, 65 insertions(+), 6 deletions(-) > > Any review comments? Ping What is the status of this series, is it ready to land or are any changes needed? Thanks Matthias ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
On Thu, 2020-04-09 at 11:41 +0200, Daniel Vetter wrote: > Now if these boxes didn't ever have agp then I think we can get away > with deleting this, since we've already deleted the legacy radeon > driver. And that one used vmalloc for everything. The new kms one does > use the dma-api if the gpu isn't connected through agp Definitely no AGP there. Cheers Ben. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools
On 4/8/20 4:21 PM, David Rientjes wrote: When a device required unencrypted memory and the context does not allow required => requires blocking, memory must be returned from the atomic coherent pools. This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the config only requires CONFIG_DMA_COHERENT_POOL. This will be used for CONFIG_AMD_MEM_ENCRYPT in a subsequent patch. Keep all memory in these pools unencrypted. Signed-off-by: David Rientjes --- kernel/dma/direct.c | 16 kernel/dma/pool.c | 15 +-- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 70800ca64f13..44165263c185 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; + /* +* Unencrypted memory must come directly from DMA atomic pools if +* blocking is not allowed. +*/ + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && + force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) { + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), , gfp); + if (!ret) + return NULL; + goto done; + } + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs) && !gfpflags_allow_blocking(gfp)) { @@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index e14c5a2da734..6685ab89cfa7 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, arch_dma_prep_coherent(page, pool_size); +#ifdef CONFIG_DMA_DIRECT_REMAP addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto free_page; - +#else + addr = page_to_virt(page); +#endif + /* +* Memory in the atomic DMA pools must be unencrypted, the pools do not +* shrink so no re-encryption occurs in dma_direct_free_pages(). +*/ + set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order); ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), pool_size, NUMA_NO_NODE); if (ret) @@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, return 0; remove_mapping: +#ifdef CONFIG_DMA_DIRECT_REMAP dma_common_free_remap(addr, pool_size); You're about to free the memory, but you've called set_memory_decrypted() against it, so you need to do a set_memory_encrypted() to bring it back to a state ready for allocation again. Thanks, Tom -free_page: +#endif +free_page: __maybe_unused if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out: ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 25/28] mm: remove vmalloc_user_node_flags
cc Johannes who suggested this API call originally On Wed, Apr 8, 2020 at 5:03 AM Christoph Hellwig wrote: > > Open code it in __bpf_map_area_alloc, which is the only caller. Also > clean up __bpf_map_area_alloc to have a single vmalloc call with > slightly different flags instead of the current two different calls. > > For this to compile for the nommu case add a __vmalloc_node_range stub > to nommu.c. > > Signed-off-by: Christoph Hellwig > --- > include/linux/vmalloc.h | 1 - > kernel/bpf/syscall.c| 23 +-- > mm/nommu.c | 14 -- > mm/vmalloc.c| 20 > 4 files changed, 21 insertions(+), 37 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 108f49b47756..f90f2946aac2 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -106,7 +106,6 @@ extern void *vzalloc(unsigned long size); > extern void *vmalloc_user(unsigned long size); > extern void *vmalloc_node(unsigned long size, int node); > extern void *vzalloc_node(unsigned long size, int node); > -extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t > flags); > extern void *vmalloc_exec(unsigned long size); > extern void *vmalloc_32(unsigned long size); > extern void *vmalloc_32_user(unsigned long size); > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 48d98ea8fad6..249d9bd43321 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -281,26 +281,29 @@ static void *__bpf_map_area_alloc(u64 size, int > numa_node, bool mmapable) > * __GFP_RETRY_MAYFAIL to avoid such situations. > */ > > - const gfp_t flags = __GFP_NOWARN | __GFP_ZERO; > + const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO; > + unsigned int flags = 0; > + unsigned long align = 1; > void *area; > > if (size >= SIZE_MAX) > return NULL; > > /* kmalloc()'ed memory can't be mmap()'ed */ > - if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > - area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags, > + if (mmapable) { > + BUG_ON(!PAGE_ALIGNED(size)); > + align = SHMLBA; > + flags = VM_USERMAP; > + } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > + area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY, > numa_node); > if (area != NULL) > return area; > } > - if (mmapable) { > - BUG_ON(!PAGE_ALIGNED(size)); > - return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL | > - __GFP_RETRY_MAYFAIL | flags); > - } > - return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | > flags, > - numa_node, __builtin_return_address(0)); > + > + return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, > + gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL, > + flags, numa_node, __builtin_return_address(0)); > } > > void *bpf_map_area_alloc(u64 size, int numa_node) > diff --git a/mm/nommu.c b/mm/nommu.c > index 81a86cd85893..b42cd6003d7d 100644 > --- a/mm/nommu.c > +++ b/mm/nommu.c > @@ -150,6 +150,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask) > } > EXPORT_SYMBOL(__vmalloc); > > +void *__vmalloc_node_range(unsigned long size, unsigned long align, > + unsigned long start, unsigned long end, gfp_t gfp_mask, > + pgprot_t prot, unsigned long vm_flags, int node, > + const void *caller) > +{ > + return __vmalloc(size, flags); > +} > + > void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, > int node, const void *caller) > { > @@ -180,12 +188,6 @@ void *vmalloc_user(unsigned long size) > } > EXPORT_SYMBOL(vmalloc_user); > > -void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags) > -{ > - return __vmalloc_user_flags(size, flags | __GFP_ZERO); > -} > -EXPORT_SYMBOL(vmalloc_user_node_flags); > - > struct page *vmalloc_to_page(const void *addr) > { > return virt_to_page(addr); > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 333fbe77255a..f6f2acdaf70c 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2658,26 +2658,6 @@ void *vzalloc_node(unsigned long size, int node) > } > EXPORT_SYMBOL(vzalloc_node); > > -/** > - * vmalloc_user_node_flags - allocate memory for userspace on a specific node > - * @size: allocation size > - * @node: numa node > - * @flags: flags for the page level allocator > - * > - * The resulting memory area is zeroed so it can be mapped to userspace > - * without leaking data. > - * > - * Return: pointer to the allocated memory or %NULL on error > - */ > -void
[PATCH 0/1] Real DMA dev DMA domain patch
Sorry for the late patch here, but I hit the issue Baolu and Daniel pointed out could occur, and requires this fix (or iommu=nopt). Hoping to get it into an rc Jon Derrick (1): iommu/vt-d: use DMA domain for real DMA devices and subdevices drivers/iommu/intel-iommu.c | 56 - 1 file changed, 43 insertions(+), 13 deletions(-) -- 2.18.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/vt-d: use DMA domain for real DMA devices and subdevices
The PCI devices handled by intel-iommu may have a DMA requester on another bus, such as VMD subdevices needing to use the VMD endpoint. The real DMA device is now used for the DMA mapping, but one case was missed earlier: if the VMD device (and hence subdevices too) are under IOMMU_DOMAIN_IDENTITY, mappings do not work. Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an iommu DMA mapping, and fall back on dma_direct_map_page() for the IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken when intel_page_page() handles a subdevice. We observe that at iommu attach time, dmar_insert_one_dev_info() for the subdevices will never set dev->archdata.iommu. This is because that function uses find_domain() to check if there is already an IOMMU for the device, and find_domain() then defers to the real DMA device which does have one. Thus dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. Then, later: 1. intel_map_page() checks if an IOMMU mapping is needed by calling iommu_need_mapping() on the subdevice. identity_mapping() returns false because dev->archdata.iommu is NULL, so this function returns false indicating that mapping is needed. 2. __intel_map_single() is called to create the mapping. 3. __intel_map_single() calls find_domain(). This function now returns the IDENTITY domain corresponding to the real DMA device. 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. A failure is hit and the entire operation is aborted, because this codepath is not intended to handle IDENTITY mappings: if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) return NULL; This becomes problematic if the real DMA device and the subdevices have different addressing capabilities and some require translation. Instead we can put the real DMA dev and any subdevices on the DMA domain. This change assigns subdevices to the DMA domain, and moves the real DMA device to the DMA domain if necessary. Reported-by: Daniel Drake Fixes: b0140c69637e ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Signed-off-by: Jon Derrick Signed-off-by: Daniel Drake --- drivers/iommu/intel-iommu.c | 56 - 1 file changed, 43 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index ef0a5246700e..b4844a502499 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3049,6 +3049,9 @@ static int device_def_domain_type(struct device *dev) if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + if (pci_real_dma_dev(pdev) != pdev) + return IOMMU_DOMAIN_DMA; + /* * We want to start off with all devices in the 1:1 domain, and * take them out later if we find they can't access all of memory. @@ -5781,12 +5784,32 @@ static bool intel_iommu_capable(enum iommu_cap cap) return false; } +static int intel_iommu_request_dma_domain_for_dev(struct device *dev, + struct dmar_domain *domain) +{ + int ret; + + ret = iommu_request_dma_domain_for_dev(dev); + if (ret) { + dmar_remove_one_dev_info(dev); + domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; + if (!get_private_domain_for_dev(dev)) { + dev_warn(dev, +"Failed to get a private domain.\n"); + return -ENOMEM; + } + } + + return 0; +} + static int intel_iommu_add_device(struct device *dev) { struct dmar_domain *dmar_domain; struct iommu_domain *domain; struct intel_iommu *iommu; struct iommu_group *group; + struct device *real_dev = dev; u8 bus, devfn; int ret; @@ -5810,6 +5833,21 @@ static int intel_iommu_add_device(struct device *dev) domain = iommu_get_domain_for_dev(dev); dmar_domain = to_dmar_domain(domain); + + if (dev_is_pci(dev)) + real_dev = _real_dma_dev(to_pci_dev(dev))->dev; + + if (real_dev != dev) { + domain = iommu_get_domain_for_dev(real_dev); + if (domain->type != IOMMU_DOMAIN_DMA) { + dmar_remove_one_dev_info(real_dev); + + ret = intel_iommu_request_dma_domain_for_dev(real_dev, dmar_domain); + if (ret) + goto unlink; + } + } + if (domain->type == IOMMU_DOMAIN_DMA) { if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) { ret = iommu_request_dm_for_dev(dev); @@ -5823,20 +5861,12 @@ static int intel_iommu_add_device(struct device *dev) } } else {
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote: > This allows to unexport map_vm_area and unmap_kernel_range, which are > rather deep internal and should not be available to modules. Even though I don't know how many usecase we have using zsmalloc as module(I heard only once by dumb reason), it could affect existing users. Thus, please include concrete explanation in the patch to justify when the complain occurs. > > Signed-off-by: Christoph Hellwig > --- > mm/Kconfig | 2 +- > mm/vmalloc.c | 2 -- > 2 files changed, 1 insertion(+), 3 deletions(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 36949a9425b8..614cc786b519 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -702,7 +702,7 @@ config ZSMALLOC > > config ZSMALLOC_PGTABLE_MAPPING > bool "Use page table mapping to access object in zsmalloc" > - depends on ZSMALLOC > + depends on ZSMALLOC=y > help > By default, zsmalloc uses a copy-based object mapping method to > access allocations that span two pages. However, if a particular > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 3375f9508ef6..9183fc0d365a 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2046,7 +2046,6 @@ void unmap_kernel_range(unsigned long addr, unsigned > long size) > vunmap_page_range(addr, end); > flush_tlb_kernel_range(addr, end); > } > -EXPORT_SYMBOL_GPL(unmap_kernel_range); > > int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages) > { > @@ -2058,7 +2057,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, > struct page **pages) > > return err > 0 ? 0 : err; > } > -EXPORT_SYMBOL_GPL(map_vm_area); > > static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, > struct vmap_area *va, unsigned long flags, const void *caller) > -- > 2.25.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING
On Wed, Apr 08, 2020 at 01:59:07PM +0200, Christoph Hellwig wrote: > Rename the Kconfig variable to clarify the scope. > > Signed-off-by: Christoph Hellwig Acked-by: Minchan Kim ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On Thu, Apr 09, 2020 at 06:50:30PM +0200, Peter Zijlstra wrote: > On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote: > > On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote: > > > This allows to unexport map_vm_area and unmap_kernel_range, which are > > > rather deep internal and should not be available to modules. > > > > Even though I don't know how many usecase we have using zsmalloc as > > module(I heard only once by dumb reason), it could affect existing > > users. Thus, please include concrete explanation in the patch to > > justify when the complain occurs. > > The justification is 'we can unexport functions that have no sane reason > of being exported in the first place'. > > The Changelog pretty much says that. Okay, I hope there is no affected user since this patch. If there are someone, they need to provide sane reason why they want to have zsmalloc as module. Acked-by: Minchan Kim ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page
On Wed, Apr 08, 2020 at 01:58:59PM +0200, Christoph Hellwig wrote: > Use the designated helper for allocating executable kernel memory, and > remove the now unused PAGE_KERNEL_RX define. > > Signed-off-by: Christoph Hellwig Acked-by: Wei Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, Apr 09, 2020 at 09:21:34AM -0700, Jacob Pan wrote: > On Thu, 9 Apr 2020 11:25:19 -0300 > Jason Gunthorpe wrote: > > > On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote: > > > > When the process is killed, mm release can happen before fds are > > > > released. If you look at do_exit() in kernel/exit.c: > > > > > > > > exit_mm() > > > > mmput() > > > >-> mmu release notifier > > > > ... > > > > exit_files() > > > > close_files() > > > > fput() > > > > exit_task_work() > > > > __fput() > > > >-> unbind() > > > > > > > So unbind is coming anyway, the difference in handling in mmu > > > release notifier is whether we silently drop DMA fault vs. > > > reporting fault? > > > > Userspace can significantly delay the final fput triggering the > > unbind, the above is only for the trivial case where the process > > owning the mm_struct is the only process holding the fd. > > > Are you talking about FDs owned buy children after fork() or FDs sent > over to another process. I think, in either case SVA is not supported. Supported or not a hostile user space can trigger these conditions and it should not cause misbehavior from the kernel (eg log spamming) Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc
On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote: > On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote: > > This allows to unexport map_vm_area and unmap_kernel_range, which are > > rather deep internal and should not be available to modules. > > Even though I don't know how many usecase we have using zsmalloc as > module(I heard only once by dumb reason), it could affect existing > users. Thus, please include concrete explanation in the patch to > justify when the complain occurs. The justification is 'we can unexport functions that have no sane reason of being exported in the first place'. The Changelog pretty much says that. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, 9 Apr 2020 09:08:21 -0300 Jason Gunthorpe wrote: > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > > > Yes, this is the proper way, when the DMA is stopped and no use > > > of the PASID remains then you can drop the mmu notifier and > > > release the PASID entirely. If that is linked to the lifetime of > > > the FD then forget completely about the mm_struct lifetime, it > > > doesn't matter.. > > Got everything above, thanks a lot. > > > > If everything is in order with the FD close. Why do we need to > > "ask IOMMU drivers to silently abort DMA and Page Requests in the > > meantime." in mm_exit notifier? This will be done orderly in unbind > > anyway. > > I thought this is exactly what Jean-Phillippe is removing here, it is > a bad idea for the reasons he explained. > I think this patch only removed driver side callbacks, i.e. to stop DMA. But not removing IOMMU side of stop DMA, PRS. Before uacce, (universal accelerator framework), sva bind/unbind is not guaranteed at open/close FD time. Therefore, mmu notifier is needed if mmexit comes without unbind. > > > > Enforcing unbind upon FD close might be a precarious path, > > > > perhaps that is why we have to deal with out of order > > > > situation? > > > > > > How so? You just put it in the FD release function :) > > > > I was thinking some driver may choose to defer unbind in some > > workqueue etc. > > Doesn't really change anything, the lifetime of the PASID wouuld be > the lifetime of the notifier and the bind, and DMA can't continue > without the notifier registered. > True, it is just better not to defer. Otherwise, the window of suppressing error gets longer. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, 9 Apr 2020 16:50:58 +0200 Jean-Philippe Brucker wrote: > On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote: > > On Thu, 9 Apr 2020 08:39:05 +0200 > > Jean-Philippe Brucker wrote: > > > > > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > > > > On Wed, 8 Apr 2020 19:32:18 -0300 > > > > Jason Gunthorpe wrote: > > > > > > > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote: > > > > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan > > > > > > > wrote: > > > > > > > > Hi Jean, > > > > > > > > > > > > > > > > On Wed, 8 Apr 2020 16:04:25 +0200 > > > > > > > > Jean-Philippe Brucker wrote: > > > > > > > > > > > > > > > > > The IOMMU SVA API currently requires device drivers to > > > > > > > > > implement an mm_exit() callback, which stops device > > > > > > > > > jobs that do DMA. This function is called in the > > > > > > > > > release() MMU notifier, when an address space that is > > > > > > > > > shared with a device exits. > > > > > > > > > > > > > > > > > > It has been noted several time during discussions > > > > > > > > > about SVA that cancelling DMA jobs can be slow and > > > > > > > > > complex, and doing it in the release() notifier might > > > > > > > > > cause synchronization issues (patch 2 has more > > > > > > > > > background). Device drivers must in any case call > > > > > > > > > unbind() to remove their bond, after stopping DMA > > > > > > > > > from a more favorable context (release of a file > > > > > > > > > descriptor). > > > > > > > > > > > > > > > > > > So after mm exits, rather than notifying device > > > > > > > > > drivers, we can hold on to the PASID until unbind(), > > > > > > > > > ask IOMMU drivers to silently abort DMA and Page > > > > > > > > > Requests in the meantime. This change should relieve > > > > > > > > > the mmput() path. > > > > > > > > > > > > > > > > I assume mm is destroyed after all the FDs are > > > > > > > > closed > > > > > > > > > > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), > > > > > > > ie anything using mmu_notifiers has to hold a grab until > > > > > > > the notifier is destroyed, which is often triggered by FD > > > > > > > close. > > > > > > Sorry, I don't get this. Are you saying we have to hold a > > > > > > mmgrab() between svm_bind/mmu_notifier_register and > > > > > > svm_unbind/mmu_notifier_unregister? > > > > > > > > > > Yes. This is done automatically for the caller inside the > > > > > mmu_notifier implementation. We now even store the mm_struct > > > > > pointer inside the notifier. > > > > > > > > > > Once a notifier is registered the mm_struct remains valid > > > > > memory until the notifier is unregistered. > > > > > > > > > > > Isn't the idea of mmu_notifier is to avoid holding the mm > > > > > > reference and rely on the notifier to tell us when mm is > > > > > > going away? > > > > > > > > > > The notifier only holds a mmgrab(), not a mmget() - this > > > > > allows exit_mmap to proceed, but the mm_struct memory remains. > > > > > > > > > > This is also probably why it is a bad idea to tie the > > > > > lifetime of something like a pasid to the mmdrop as a evil > > > > > user could cause a large number of mm structs to be released > > > > > but not freed, probably defeating cgroup limits and so forth > > > > > (not sure) > > > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab > > > > > > after mmu_notifier_register. > > > > > > > > > > It is done internally to the implementation. > > > > > > > > > > > > So the exit_mmap() -> release() may happen before the FDs > > > > > > > are destroyed, but the final mmdrop() will be during some > > > > > > > FD release when the final mmdrop() happens. > > > > > > > > > > > > Do you mean mmdrop() is after FD release? > > > > > > > > > > Yes, it will be done by the mmu_notifier_unregister(), which > > > > > should be called during FD release if the iommu lifetime is > > > > > linked to some FD. > > > > > > If so, unbind is called in FD release should take care of > > > > > > everything, i.e. stops DMA, clear PASID context on IOMMU, > > > > > > flush PRS queue etc. > > > > > > > > > > Yes, this is the proper way, when the DMA is stopped and no > > > > > use of the PASID remains then you can drop the mmu notifier > > > > > and release the PASID entirely. If that is linked to the > > > > > lifetime of the FD then forget completely about the mm_struct > > > > > lifetime, it doesn't matter.. > > > > Got everything above, thanks a lot. > > > > > > > > If everything is in order with the FD close. Why do we need to > > > > "ask IOMMU drivers to silently abort DMA and Page Requests in > > > > the meantime." in mm_exit notifier? This will be done orderly > > > > in unbind anyway. > > > > > > When the process is killed, mm release can happen before fds are > > > released. If you look at do_exit() in
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, 9 Apr 2020 11:25:19 -0300 Jason Gunthorpe wrote: > On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote: > > > When the process is killed, mm release can happen before fds are > > > released. If you look at do_exit() in kernel/exit.c: > > > > > > exit_mm() > > > mmput() > > > -> mmu release notifier > > > ... > > > exit_files() > > > close_files() > > > fput() > > > exit_task_work() > > > __fput() > > > -> unbind() > > > > > So unbind is coming anyway, the difference in handling in mmu > > release notifier is whether we silently drop DMA fault vs. > > reporting fault? > > Userspace can significantly delay the final fput triggering the > unbind, the above is only for the trivial case where the process > owning the mm_struct is the only process holding the fd. > Are you talking about FDs owned buy children after fork() or FDs sent over to another process. I think, in either case SVA is not supported. > The destruction of a mm_struct should be treated the same as unmapping > every vma in the process. The observable effect should be no different > than munmap. > Good point. I agree, we should suppress the error in the window before unbind. > Jason [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
On Thu, Apr 9, 2020 at 4:19 PM Alex Deucher wrote: > > On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter wrote: > > > > On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt > > wrote: > > > > > > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote: > > > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote: > > > > > If this code was broken for non-coherent caches a crude powerpc hack > > > > > isn't going to help anyone else. Remove the hack as it is the last > > > > > user of __vmalloc passing a page protection flag other than > > > > > PAGE_KERNEL. > > > > > > > > Well Ben added this to make stuff work on ppc, ofc the home grown dma > > > > layer in drm from back then isn't going to work in other places. I guess > > > > should have at least an ack from him, in case anyone still cares about > > > > this on ppc. Adding Ben to cc. > > > > > > This was due to some drivers (radeon ?) trying to use vmalloc pages for > > > coherent DMA, which means on those 4xx powerpc's need to be non-cached. > > > > > > There were machines using that (440 based iirc), though I honestly > > > can't tell if anybody still uses any of it. > > > > agp subsystem still seems to happily do that (vmalloc memory for > > device access), never having been ported to dma apis (or well > > converted to iommu drivers, which they kinda are really). So I think > > this all still works exactly as back then, even with the kms radeon > > drivers. Question really is whether we have users left, and I have no > > clue about that either. > > > > Now if these boxes didn't ever have agp then I think we can get away > > with deleting this, since we've already deleted the legacy radeon > > driver. And that one used vmalloc for everything. The new kms one does > > use the dma-api if the gpu isn't connected through agp. > > All radeons have a built in remapping table to handle non-AGP systems. > On the earlier radeons it wasn't quite as performant as AGP, but it > was always more reliable because AGP is AGP. Maybe it's time to let > AGP go? I'd be very much in favour of that, if we can just use the integrated gart and drop agp fast writes wobbliness on the floor. I think the only other modern driver using agp would be nouveau at that point. -Daniel > > Alex > > > -Daniel > > > > > Cheers, > > > Ben. > > > > > > > -Daniel > > > > > > > > > > > > > > Signed-off-by: Christoph Hellwig > > > > > --- > > > > > drivers/gpu/drm/drm_scatter.c | 11 +-- > > > > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_scatter.c > > > > > b/drivers/gpu/drm/drm_scatter.c > > > > > index ca520028b2cb..f4e6184d1877 100644 > > > > > --- a/drivers/gpu/drm/drm_scatter.c > > > > > +++ b/drivers/gpu/drm/drm_scatter.c > > > > > @@ -43,15 +43,6 @@ > > > > > > > > > > #define DEBUG_SCATTER 0 > > > > > > > > > > -static inline void *drm_vmalloc_dma(unsigned long size) > > > > > -{ > > > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) > > > > > - return __vmalloc(size, GFP_KERNEL, > > > > > pgprot_noncached_wc(PAGE_KERNEL)); > > > > > -#else > > > > > - return vmalloc_32(size); > > > > > -#endif > > > > > -} > > > > > - > > > > > static void drm_sg_cleanup(struct drm_sg_mem * entry) > > > > > { > > > > > struct page *page; > > > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, > > > > > void *data, > > > > > return -ENOMEM; > > > > > } > > > > > > > > > > - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); > > > > > + entry->virtual = vmalloc_32(pages << PAGE_SHIFT); > > > > > if (!entry->virtual) { > > > > > kfree(entry->busaddr); > > > > > kfree(entry->pagelist); > > > > > -- > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > ___ > > dri-devel mailing list > > dri-de...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote: > On Thu, 9 Apr 2020 08:39:05 +0200 > Jean-Philippe Brucker wrote: > > > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > > > On Wed, 8 Apr 2020 19:32:18 -0300 > > > Jason Gunthorpe wrote: > > > > > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote: > > > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote: > > > > > > > Hi Jean, > > > > > > > > > > > > > > On Wed, 8 Apr 2020 16:04:25 +0200 > > > > > > > Jean-Philippe Brucker wrote: > > > > > > > > > > > > > > > The IOMMU SVA API currently requires device drivers to > > > > > > > > implement an mm_exit() callback, which stops device jobs > > > > > > > > that do DMA. This function is called in the release() MMU > > > > > > > > notifier, when an address space that is shared with a > > > > > > > > device exits. > > > > > > > > > > > > > > > > It has been noted several time during discussions about > > > > > > > > SVA that cancelling DMA jobs can be slow and complex, and > > > > > > > > doing it in the release() notifier might cause > > > > > > > > synchronization issues (patch 2 has more background). > > > > > > > > Device drivers must in any case call unbind() to remove > > > > > > > > their bond, after stopping DMA from a more favorable > > > > > > > > context (release of a file descriptor). > > > > > > > > > > > > > > > > So after mm exits, rather than notifying device drivers, > > > > > > > > we can hold on to the PASID until unbind(), ask IOMMU > > > > > > > > drivers to silently abort DMA and Page Requests in the > > > > > > > > meantime. This change should relieve the mmput() > > > > > > > > path. > > > > > > > > > > > > > > I assume mm is destroyed after all the FDs are closed > > > > > > > > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie > > > > > > anything using mmu_notifiers has to hold a grab until the > > > > > > notifier is destroyed, which is often triggered by FD close. > > > > > > > > > > > Sorry, I don't get this. Are you saying we have to hold a > > > > > mmgrab() between svm_bind/mmu_notifier_register and > > > > > svm_unbind/mmu_notifier_unregister? > > > > > > > > Yes. This is done automatically for the caller inside the > > > > mmu_notifier implementation. We now even store the mm_struct > > > > pointer inside the notifier. > > > > > > > > Once a notifier is registered the mm_struct remains valid memory > > > > until the notifier is unregistered. > > > > > > > > > Isn't the idea of mmu_notifier is to avoid holding the mm > > > > > reference and rely on the notifier to tell us when mm is going > > > > > away? > > > > > > > > The notifier only holds a mmgrab(), not a mmget() - this allows > > > > exit_mmap to proceed, but the mm_struct memory remains. > > > > > > > > This is also probably why it is a bad idea to tie the lifetime of > > > > something like a pasid to the mmdrop as a evil user could cause a > > > > large number of mm structs to be released but not freed, probably > > > > defeating cgroup limits and so forth (not sure) > > > > > > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab > > > > > after mmu_notifier_register. > > > > > > > > It is done internally to the implementation. > > > > > > > > > > So the exit_mmap() -> release() may happen before the FDs are > > > > > > destroyed, but the final mmdrop() will be during some FD > > > > > > release when the final mmdrop() happens. > > > > > > > > > > Do you mean mmdrop() is after FD release? > > > > > > > > Yes, it will be done by the mmu_notifier_unregister(), which > > > > should be called during FD release if the iommu lifetime is > > > > linked to some FD. > > > > > If so, unbind is called in FD release should take care of > > > > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush > > > > > PRS queue etc. > > > > > > > > Yes, this is the proper way, when the DMA is stopped and no use > > > > of the PASID remains then you can drop the mmu notifier and > > > > release the PASID entirely. If that is linked to the lifetime of > > > > the FD then forget completely about the mm_struct lifetime, it > > > > doesn't matter.. > > > Got everything above, thanks a lot. > > > > > > If everything is in order with the FD close. Why do we need to > > > "ask IOMMU drivers to silently abort DMA and Page Requests in the > > > meantime." in mm_exit notifier? This will be done orderly in unbind > > > anyway. > > > > When the process is killed, mm release can happen before fds are > > released. If you look at do_exit() in kernel/exit.c: > > > > exit_mm() > > mmput() > >-> mmu release notifier > > ... > > exit_files() > > close_files() > > fput() > > exit_task_work() > > __fput() > >-> unbind() > > > So unbind is coming anyway, the difference in handling in mmu release > notifier is whether we
Re: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner
Hi Marek, On Thu, Apr 09, 2020 at 03:58:00PM +0200, Marek Szyprowski wrote: > The main problem after your conversion is the fact that ->probe_device() > is called very early, before any other platform device (thus IOMMU > controller) is is probed. It doesn't handle EPROBE_DEFER too. I don't quite understand why probe_device() is called too early, as it is called at the same time add_device() was called before. But anyway, I have seen a similar problem on OMAP. If the SYSMMU for a master is not probed yet when probe_device() is called, it can just return -ENODEV and in your driver you just call but_iommu_probe() when a new SYSMMU got initialized to re-probe uninitialized masters on the bus. This patch-set contains a change to export bus_iommu_probe() for exactly that reason. What do you think? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, Apr 09, 2020 at 07:14:24AM -0700, Jacob Pan wrote: > > When the process is killed, mm release can happen before fds are > > released. If you look at do_exit() in kernel/exit.c: > > > > exit_mm() > > mmput() > >-> mmu release notifier > > ... > > exit_files() > > close_files() > > fput() > > exit_task_work() > > __fput() > >-> unbind() > > > So unbind is coming anyway, the difference in handling in mmu release > notifier is whether we silently drop DMA fault vs. reporting fault? Userspace can significantly delay the final fput triggering the unbind, the above is only for the trivial case where the process owning the mm_struct is the only process holding the fd. The destruction of a mm_struct should be treated the same as unmapping every vma in the process. The observable effect should be no different than munmap. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter wrote: > > On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt > wrote: > > > > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote: > > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote: > > > > If this code was broken for non-coherent caches a crude powerpc hack > > > > isn't going to help anyone else. Remove the hack as it is the last > > > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL. > > > > > > Well Ben added this to make stuff work on ppc, ofc the home grown dma > > > layer in drm from back then isn't going to work in other places. I guess > > > should have at least an ack from him, in case anyone still cares about > > > this on ppc. Adding Ben to cc. > > > > This was due to some drivers (radeon ?) trying to use vmalloc pages for > > coherent DMA, which means on those 4xx powerpc's need to be non-cached. > > > > There were machines using that (440 based iirc), though I honestly > > can't tell if anybody still uses any of it. > > agp subsystem still seems to happily do that (vmalloc memory for > device access), never having been ported to dma apis (or well > converted to iommu drivers, which they kinda are really). So I think > this all still works exactly as back then, even with the kms radeon > drivers. Question really is whether we have users left, and I have no > clue about that either. > > Now if these boxes didn't ever have agp then I think we can get away > with deleting this, since we've already deleted the legacy radeon > driver. And that one used vmalloc for everything. The new kms one does > use the dma-api if the gpu isn't connected through agp. All radeons have a built in remapping table to handle non-AGP systems. On the earlier radeons it wasn't quite as performant as AGP, but it was always more reliable because AGP is AGP. Maybe it's time to let AGP go? Alex > -Daniel > > > Cheers, > > Ben. > > > > > -Daniel > > > > > > > > > > > Signed-off-by: Christoph Hellwig > > > > --- > > > > drivers/gpu/drm/drm_scatter.c | 11 +-- > > > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/drm_scatter.c > > > > b/drivers/gpu/drm/drm_scatter.c > > > > index ca520028b2cb..f4e6184d1877 100644 > > > > --- a/drivers/gpu/drm/drm_scatter.c > > > > +++ b/drivers/gpu/drm/drm_scatter.c > > > > @@ -43,15 +43,6 @@ > > > > > > > > #define DEBUG_SCATTER 0 > > > > > > > > -static inline void *drm_vmalloc_dma(unsigned long size) > > > > -{ > > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) > > > > - return __vmalloc(size, GFP_KERNEL, > > > > pgprot_noncached_wc(PAGE_KERNEL)); > > > > -#else > > > > - return vmalloc_32(size); > > > > -#endif > > > > -} > > > > - > > > > static void drm_sg_cleanup(struct drm_sg_mem * entry) > > > > { > > > > struct page *page; > > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, > > > > void *data, > > > > return -ENOMEM; > > > > } > > > > > > > > - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); > > > > + entry->virtual = vmalloc_32(pages << PAGE_SHIFT); > > > > if (!entry->virtual) { > > > > kfree(entry->busaddr); > > > > kfree(entry->pagelist); > > > > -- > > > > 2.25.1 > > > > > > > > > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/exynos: Rework intialization
Fix initialization after driver conversion to probe_device()/release_device(). Prepared on top of: https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-probe-device Signed-off-by: Marek Szyprowski --- drivers/iommu/exynos-iommu.c | 80 +--- 1 file changed, 46 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index f865c90..53c784f 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -565,6 +565,7 @@ static void sysmmu_tlb_invalidate_entry(struct sysmmu_drvdata *data, } static const struct iommu_ops exynos_iommu_ops; +static int exynos_iommu_initialize_owner(struct device *sysmmu); static int exynos_sysmmu_probe(struct platform_device *pdev) { @@ -573,6 +574,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev) struct sysmmu_drvdata *data; struct resource *res; + dev_info(dev, "%s %d\n", __func__, __LINE__); + data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; @@ -649,6 +652,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev) pm_runtime_enable(dev); + exynos_iommu_initialize_owner(dev); + return 0; } @@ -1225,24 +1230,8 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct iommu_domain *iommu_domain, static struct iommu_device *exynos_iommu_probe_device(struct device *dev) { - struct exynos_iommu_owner *owner = dev->archdata.iommu; - struct sysmmu_drvdata *data; - - if (!has_sysmmu(dev)) - return ERR_PTR(-ENODEV); - - list_for_each_entry(data, >controllers, owner_node) { - /* -* SYSMMU will be runtime activated via device link -* (dependency) to its master device, so there are no -* direct calls to pm_runtime_get/put in this driver. -*/ - data->link = device_link_add(dev, data->sysmmu, -DL_FLAG_STATELESS | -DL_FLAG_PM_RUNTIME); - } - - return >iommu; + /* this is called too early on ARM 32bit to do anything usefull */ + return ERR_PTR(-ENODEV); } static void exynos_iommu_release_device(struct device *dev) @@ -1268,7 +1257,8 @@ static void exynos_iommu_release_device(struct device *dev) device_link_del(data->link); } -static int exynos_iommu_device_init(struct exynos_iommu_owner *owner) +static int exynos_iommu_device_init(struct device *dev, + struct exynos_iommu_owner *owner) { static u32 counter = 0; int ret; @@ -1287,6 +1277,12 @@ static int exynos_iommu_device_init(struct exynos_iommu_owner *owner) iommu_device_set_ops(>iommu, _iommu_ops); + /* +* the above iommu_device_set_ops is not enough, initializing fwspec +* is also required +*/ + iommu_fwspec_init(dev, >of_node->fwnode, _iommu_ops); + return 0; } @@ -1308,7 +1304,7 @@ static int exynos_owner_init(struct device *dev) if (!owner) return -ENOMEM; - ret = exynos_iommu_device_init(owner); + ret = exynos_iommu_device_init(dev, owner); if (ret) goto out_free_owner; @@ -1330,34 +1326,51 @@ static int exynos_owner_init(struct device *dev) return ret; } -static int exynos_iommu_of_xlate(struct device *dev, -struct of_phandle_args *spec) +static int exynos_iommu_dev_match_owner(struct device *dev, const void *data) +{ + const struct device *sysmmu = data; + struct device_node *np; + int idx = 0; + + do { + np = of_parse_phandle(dev->of_node, "iommus", idx++); + if (np == sysmmu->of_node) + return true; + } while (np); + + return false; +} + +static int exynos_iommu_initialize_owner(struct device *sysmmu) { - struct platform_device *sysmmu = of_find_device_by_node(spec->np); - struct sysmmu_drvdata *data, *entry; + struct sysmmu_drvdata *data = dev_get_drvdata(sysmmu); struct exynos_iommu_owner *owner; + struct device *dev; int ret; - if (!sysmmu) + dev = bus_find_device(_bus_type, NULL, sysmmu, + exynos_iommu_dev_match_owner); + if (!dev) return -ENODEV; - data = platform_get_drvdata(sysmmu); - if (!data) - return -ENODEV; + dev_info(sysmmu, "found master device %s\n", dev_name(dev)); ret = exynos_owner_init(dev); if (ret) return ret; owner = dev->archdata.iommu; - - list_for_each_entry(entry, >controllers, owner_node) - if (entry == data) - return 0; -
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Thu, 9 Apr 2020 08:39:05 +0200 Jean-Philippe Brucker wrote: > On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > > On Wed, 8 Apr 2020 19:32:18 -0300 > > Jason Gunthorpe wrote: > > > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote: > > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote: > > > > > > Hi Jean, > > > > > > > > > > > > On Wed, 8 Apr 2020 16:04:25 +0200 > > > > > > Jean-Philippe Brucker wrote: > > > > > > > > > > > > > The IOMMU SVA API currently requires device drivers to > > > > > > > implement an mm_exit() callback, which stops device jobs > > > > > > > that do DMA. This function is called in the release() MMU > > > > > > > notifier, when an address space that is shared with a > > > > > > > device exits. > > > > > > > > > > > > > > It has been noted several time during discussions about > > > > > > > SVA that cancelling DMA jobs can be slow and complex, and > > > > > > > doing it in the release() notifier might cause > > > > > > > synchronization issues (patch 2 has more background). > > > > > > > Device drivers must in any case call unbind() to remove > > > > > > > their bond, after stopping DMA from a more favorable > > > > > > > context (release of a file descriptor). > > > > > > > > > > > > > > So after mm exits, rather than notifying device drivers, > > > > > > > we can hold on to the PASID until unbind(), ask IOMMU > > > > > > > drivers to silently abort DMA and Page Requests in the > > > > > > > meantime. This change should relieve the mmput() > > > > > > > path. > > > > > > > > > > > > I assume mm is destroyed after all the FDs are closed > > > > > > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie > > > > > anything using mmu_notifiers has to hold a grab until the > > > > > notifier is destroyed, which is often triggered by FD close. > > > > > > > > > Sorry, I don't get this. Are you saying we have to hold a > > > > mmgrab() between svm_bind/mmu_notifier_register and > > > > svm_unbind/mmu_notifier_unregister? > > > > > > Yes. This is done automatically for the caller inside the > > > mmu_notifier implementation. We now even store the mm_struct > > > pointer inside the notifier. > > > > > > Once a notifier is registered the mm_struct remains valid memory > > > until the notifier is unregistered. > > > > > > > Isn't the idea of mmu_notifier is to avoid holding the mm > > > > reference and rely on the notifier to tell us when mm is going > > > > away? > > > > > > The notifier only holds a mmgrab(), not a mmget() - this allows > > > exit_mmap to proceed, but the mm_struct memory remains. > > > > > > This is also probably why it is a bad idea to tie the lifetime of > > > something like a pasid to the mmdrop as a evil user could cause a > > > large number of mm structs to be released but not freed, probably > > > defeating cgroup limits and so forth (not sure) > > > > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab > > > > after mmu_notifier_register. > > > > > > It is done internally to the implementation. > > > > > > > > So the exit_mmap() -> release() may happen before the FDs are > > > > > destroyed, but the final mmdrop() will be during some FD > > > > > release when the final mmdrop() happens. > > > > > > > > Do you mean mmdrop() is after FD release? > > > > > > Yes, it will be done by the mmu_notifier_unregister(), which > > > should be called during FD release if the iommu lifetime is > > > linked to some FD. > > > > If so, unbind is called in FD release should take care of > > > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush > > > > PRS queue etc. > > > > > > Yes, this is the proper way, when the DMA is stopped and no use > > > of the PASID remains then you can drop the mmu notifier and > > > release the PASID entirely. If that is linked to the lifetime of > > > the FD then forget completely about the mm_struct lifetime, it > > > doesn't matter.. > > Got everything above, thanks a lot. > > > > If everything is in order with the FD close. Why do we need to > > "ask IOMMU drivers to silently abort DMA and Page Requests in the > > meantime." in mm_exit notifier? This will be done orderly in unbind > > anyway. > > When the process is killed, mm release can happen before fds are > released. If you look at do_exit() in kernel/exit.c: > > exit_mm() > mmput() > -> mmu release notifier > ... > exit_files() > close_files() > fput() > exit_task_work() > __fput() > -> unbind() > So unbind is coming anyway, the difference in handling in mmu release notifier is whether we silently drop DMA fault vs. reporting fault? If a process crash during unbind, something already went seriously wrong, DMA fault is expected. I think having some error indication is useful, compared to "silently drop" Thanks, Jacob > Thanks, > Jean > > >
Re: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner
Hi Joerg, On 09.04.2020 13:46, Joerg Roedel wrote: > Hi Marek, > > I had some more thoughts and discussions with Robin about how to make > this work with the Exynos driver. The result is the patch below, can you > please test it and report back? Even better if you can fix up any > breakage it might cause :) I've checked and it works fine on top of ff68eb23308e6538ec7864c83d39540f423bbe90. However I'm not a fan of removing this 'owner' structure. It gave a nice abstraction for the all SYSMMU controllers for the given device (although most devices in the system have only one SYSMMU). Why this structure is a problem for your rework? I've also spent some time trying to fix exynos-iommu on top of your iommu-probe-device branch. I really wonder if it works on any ARM 32bit or 64bit systems for other IOMMUs. I got something working on ARM32bit, but I have to move all the initialization from exynos_iommu_probe_device/exynos_iommu_of_xlate to exynos_sysmmu_probe(). I don't like such approach, because I had to use bus_find_device() and manually find the owner for the every SYSMMU controller in the system. This approach also lack a proper symmetry and release path. The main problem after your conversion is the fact that ->probe_device() is called very early, before any other platform device (thus IOMMU controller) is is probed. It doesn't handle EPROBE_DEFER too. The other issue I've noticed is that iommu_device_set_ops() is not enough to assign ops properly. I had to add iommu_fwspec_init(dev, >of_node->fwnode, _iommu_ops) to ensure that the 'dev' gets proper iommu ops. I will send my patch in a few minutes to show you the changes. > >From 60a288509baa34df6a0bf437c977925a0a617c72 Mon Sep 17 00:00:00 2001 > From: Joerg Roedel > Date: Thu, 9 Apr 2020 13:38:18 +0200 > Subject: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' > > Remove 'struct exynos_iommu_owner' and replace it with a single-linked > list of 'struct sysmmu_drvdata'. The first item in the list acts as a > replacement for the previous exynos_iommu_owner structure. The > iommu_device member of the first list item is reported to the IOMMU > core code for the master device. > > Signed-off-by: Joerg Roedel > --- > drivers/iommu/exynos-iommu.c | 155 --- > 1 file changed, 88 insertions(+), 67 deletions(-) > > diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c > index 186ff5cc975c..e70eb360093f 100644 > --- a/drivers/iommu/exynos-iommu.c > +++ b/drivers/iommu/exynos-iommu.c > @@ -225,18 +225,6 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] > = { > { 20, REG_V5_FAULT_AW_VA, "AW SECURITY PROTECTION", IOMMU_FAULT_WRITE }, > }; > > -/* > - * This structure is attached to dev.archdata.iommu of the master device > - * on device add, contains a list of SYSMMU controllers defined by device > tree, > - * which are bound to given master device. It is usually referenced by > 'owner' > - * pointer. > -*/ > -struct exynos_iommu_owner { > - struct list_head controllers; /* list of sysmmu_drvdata.owner_node */ > - struct iommu_domain *domain;/* domain this device is attached */ > - struct mutex rpm_lock; /* for runtime pm of all sysmmus */ > -}; > - > /* >* This structure exynos specific generalization of struct iommu_domain. >* It contains list of SYSMMU controllers from all master devices, which has > @@ -271,13 +259,23 @@ struct sysmmu_drvdata { > bool active;/* current status */ > struct exynos_iommu_domain *domain; /* domain we belong to */ > struct list_head domain_node; /* node for domain clients list */ > - struct list_head owner_node;/* node for owner controllers list */ > + struct sysmmu_drvdata *next;/* Single-linked list to group SMMUs for > +one master. NULL means not in any > +list, ERR_PTR(-ENODEV) means end of > +list */ > + struct mutex rpm_lock; /* for runtime pm of all sysmmus */ > phys_addr_t pgtable;/* assigned page table structure */ > unsigned int version; /* our version */ > > struct iommu_device iommu; /* IOMMU core handle */ > }; > > +/* Helper to iterate over all SYSMMUs for a given platform device */ > +#define for_each_sysmmu(dev, drvdata)\ > + for (drvdata = (dev)->archdata.iommu; \ > + drvdata != ERR_PTR(-ENODEV); \ > + drvdata = drvdata->next) > + > static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain > *dom) > { > return container_of(dom, struct exynos_iommu_domain, domain); > @@ -624,6 +622,8 @@ static int exynos_sysmmu_probe(struct platform_device > *pdev) > > data->sysmmu = dev; > spin_lock_init(>lock); > + data->next = NULL;
RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
Hi Jean, > From: Jean-Philippe Brucker > Sent: Thursday, April 9, 2020 4:15 PM > Subject: Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to > userspace > > On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > > Hi Yi, > > > > On 4/7/20 11:43 AM, Liu, Yi L wrote: > > > Hi Jean, > > > > > >> From: Jean-Philippe Brucker > > >> Sent: Friday, April 3, 2020 4:23 PM > > >> To: Auger Eric > > >> userspace > > >> > > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > > >> header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > > >> > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); > > >> @@ -2254,6 +2309,7 > > >> @@ static int vfio_iommu_info_add_nesting_cap(struct > > > vfio_iommu *iommu, > > >> /* nesting iommu type supports PASID requests > > >> (alloc/free) > */ > > >> nesting_cap->nesting_capabilities |= > VFIO_IOMMU_PASID_REQS; > > > What is the meaning for ARM? > > > > I think it's just a software capability exposed to userspace, on > > userspace side, it has a choice to use it or not. :-) The reason > > define it and report it in cap nesting is that I'd like to make the > > pasid alloc/free be available just for IOMMU with type > > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good > > for ARM. We can find a proper way to report the availability. > > >>> > > >>> Well it is more a question for jean-Philippe. Do we have a system wide > > >>> PASID allocation on ARM? > > >> > > >> We don't, the PASID spaces are per-VM on Arm, so this function should > > >> consult > the > > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't > > >> necessarily imply PASID support. The SMMUv2 does not support PASID but > > >> does > > >> support nesting stages 1 and 2 for the IOVA space. > > >> SMMUv3 support of PASID depends on HW capabilities. So I think this > > >> needs to > be > > >> finer grained: > > >> > > >> Does the container support: > > >> * VFIO_IOMMU_PASID_REQUEST? > > >> -> Yes for VT-d 3 > > >> -> No for Arm SMMU > > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > > >> -> Yes for VT-d 3 > > >> -> Sometimes for SMMUv2 > > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due > > >> to > > >> PASID tables being in GPA space.) > > >> * VFIO_IOMMU_BIND_PASID_TABLE? > > >> -> No for VT-d > > >> -> Sometimes for SMMUv3 > > >> > > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > > > > > good summary. do you expect to see any > > > > > >> > > >> +nesting_cap->stage1_formats = formats; > > > as spotted by Kevin, since a single format is supported, rename > > > > ok, I was believing it may be possible on ARM or so. :-) will rename > > it. > > >> > > >> Yes I don't think an u32 is going to cut it for Arm :( We need to > > >> describe all sorts > of > > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID > > >> size, HW > > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean > > >> much. I > > >> guess we could have a secondary vendor capability for these? > > > > > > Actually, I'm wondering if we can define some formats to stands for a set > > > of > > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level > > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > > > the capabilities. > > > > But eventually do we really need all those capability getters? I mean > > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() > > to detect any mismatch? Definitively the error handling may be heavier > > on userspace but can't we manage. > > I think we need to present these capabilities at boot time, long before > the guest triggers a bind(). For example if the host SMMU doesn't support > 16-bit ASID, we need to communicate that to the guest using vSMMU ID > registers or PROBE properties. Otherwise a bind() will succeed, but if the > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events > which we'll inject into the guest, for no apparent reason from their > perspective. > > In addition some VMMs may have fallbacks if shared page tables are not > available. They could fall back to a MAP/UNMAP interface, or simply not > present a vIOMMU to the guest. > Based on the comments, I think it would be a need to report iommu caps in detail. So I guess iommu uapi needs to provide something alike vfio cap chain in iommu uapi. Please feel free let me know your thoughts. :-) In vfio, we can define a cap as below: struct vfio_iommu_type1_info_cap_nesting { struct vfio_info_cap_header header; __u64 iommu_model; #define VFIO_IOMMU_PASID_REQS (1 << 0) #define VFIO_IOMMU_BIND_GPASID (1 << 1) #define VFIO_IOMMU_CACHE_INV(1 << 2) __u32 nesting_capabilities; __u32 pasid_bits; #define
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Wed, Apr 08, 2020 at 04:49:01PM -0700, Fenghua Yu wrote: > > > Isn't the idea of mmu_notifier is to avoid holding the mm reference and > > > rely on the notifier to tell us when mm is going away? > > > > The notifier only holds a mmgrab(), not a mmget() - this allows > > exit_mmap to proceed, but the mm_struct memory remains. > > > > This is also probably why it is a bad idea to tie the lifetime of > > something like a pasid to the mmdrop as a evil user could cause a > > large number of mm structs to be released but not freed, probably > > defeating cgroup limits and so forth (not sure) > > The max number of processes can be limited for a user. PASID is per > address space so the max number of PASID can be limited for the user. > So the user cannot exhaust PASID so easily, right? With the patch Jacob pointed to the PASID lifetime is tied to mmdrop, and I think (but did not check) that the cgroup accounting happens before mmdrop. > Binding the PASID to the mm and freeing the PASID in __mmdrop() can get > ride of the complexity. This is a much more reasonable explanation and should be in the patch commit instead of what is there. However, it still seems unnecessary to reach for arch code - the singleton notifier can be arranged to live until exit_mmap or fd release, whichever is longer by putting a mmu_notififer_put() in the release() method Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > > Yes, this is the proper way, when the DMA is stopped and no use of the > > PASID remains then you can drop the mmu notifier and release the PASID > > entirely. If that is linked to the lifetime of the FD then forget > > completely about the mm_struct lifetime, it doesn't matter.. > > > Got everything above, thanks a lot. > > If everything is in order with the FD close. Why do we need to > "ask IOMMU drivers to silently abort DMA and Page Requests in the > meantime." in mm_exit notifier? This will be done orderly in unbind > anyway. I thought this is exactly what Jean-Phillippe is removing here, it is a bad idea for the reasons he explained. > > > Enforcing unbind upon FD close might be a precarious path, perhaps > > > that is why we have to deal with out of order situation? > > > > How so? You just put it in the FD release function :) > > I was thinking some driver may choose to defer unbind in some workqueue > etc. Doesn't really change anything, the lifetime of the PASID wouuld be the lifetime of the notifier and the bind, and DMA can't continue without the notifier registered. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: "DMAR hardware is malfunctioning" error when powering off
BTW, I normally don't run with "intel_iommu=on" (but I do have "CONFIG_IRQ_REMAP turned on), as I figure that if I'm a single-user laptop and my only VM is VMWare (running Win10 guests), and I only use my Thunderbolt ports for my own docks, that I really don't need an IOMMU anyway- but is there a benefit to having the IOMMU turned on (were it to work, that is) in my situation? -Kenny -- Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Silicon Valley ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
Am 09.04.20 um 10:54 schrieb Benjamin Herrenschmidt: > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote: >> On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote: >>> If this code was broken for non-coherent caches a crude powerpc hack >>> isn't going to help anyone else. Remove the hack as it is the last >>> user of __vmalloc passing a page protection flag other than PAGE_KERNEL. >> >> Well Ben added this to make stuff work on ppc, ofc the home grown dma >> layer in drm from back then isn't going to work in other places. I guess >> should have at least an ack from him, in case anyone still cares about >> this on ppc. Adding Ben to cc. > > This was due to some drivers (radeon ?) trying to use vmalloc pages for > coherent DMA, which means on those 4xx powerpc's need to be non-cached. > > There were machines using that (440 based iirc), though I honestly > can't tell if anybody still uses any of it. The first-gen amigaone platform (6xx/book32s) uses the radeon driver together with non-coherent DMA. However this only ever worked reliably for DRI1. br, Gerhard > Cheers, > Ben. > >> -Daniel >> >>> >>> Signed-off-by: Christoph Hellwig >>> --- >>> drivers/gpu/drm/drm_scatter.c | 11 +-- >>> 1 file changed, 1 insertion(+), 10 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c >>> index ca520028b2cb..f4e6184d1877 100644 >>> --- a/drivers/gpu/drm/drm_scatter.c >>> +++ b/drivers/gpu/drm/drm_scatter.c >>> @@ -43,15 +43,6 @@ >>> >>> #define DEBUG_SCATTER 0 >>> >>> -static inline void *drm_vmalloc_dma(unsigned long size) >>> -{ >>> -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) >>> - return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL)); >>> -#else >>> - return vmalloc_32(size); >>> -#endif >>> -} >>> - >>> static void drm_sg_cleanup(struct drm_sg_mem * entry) >>> { >>> struct page *page; >>> @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void >>> *data, >>> return -ENOMEM; >>> } >>> >>> - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); >>> + entry->virtual = vmalloc_32(pages << PAGE_SHIFT); >>> if (!entry->virtual) { >>> kfree(entry->busaddr); >>> kfree(entry->pagelist); >>> -- >>> 2.25.1 >>> >> >> > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' exynos_iommu_owner
Hi Marek, I had some more thoughts and discussions with Robin about how to make this work with the Exynos driver. The result is the patch below, can you please test it and report back? Even better if you can fix up any breakage it might cause :) >From 60a288509baa34df6a0bf437c977925a0a617c72 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Thu, 9 Apr 2020 13:38:18 +0200 Subject: [PATCH] iommu/exynos: Get rid of 'struct exynos_iommu_owner' Remove 'struct exynos_iommu_owner' and replace it with a single-linked list of 'struct sysmmu_drvdata'. The first item in the list acts as a replacement for the previous exynos_iommu_owner structure. The iommu_device member of the first list item is reported to the IOMMU core code for the master device. Signed-off-by: Joerg Roedel --- drivers/iommu/exynos-iommu.c | 155 --- 1 file changed, 88 insertions(+), 67 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 186ff5cc975c..e70eb360093f 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -225,18 +225,6 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] = { { 20, REG_V5_FAULT_AW_VA, "AW SECURITY PROTECTION", IOMMU_FAULT_WRITE }, }; -/* - * This structure is attached to dev.archdata.iommu of the master device - * on device add, contains a list of SYSMMU controllers defined by device tree, - * which are bound to given master device. It is usually referenced by 'owner' - * pointer. -*/ -struct exynos_iommu_owner { - struct list_head controllers; /* list of sysmmu_drvdata.owner_node */ - struct iommu_domain *domain;/* domain this device is attached */ - struct mutex rpm_lock; /* for runtime pm of all sysmmus */ -}; - /* * This structure exynos specific generalization of struct iommu_domain. * It contains list of SYSMMU controllers from all master devices, which has @@ -271,13 +259,23 @@ struct sysmmu_drvdata { bool active;/* current status */ struct exynos_iommu_domain *domain; /* domain we belong to */ struct list_head domain_node; /* node for domain clients list */ - struct list_head owner_node;/* node for owner controllers list */ + struct sysmmu_drvdata *next;/* Single-linked list to group SMMUs for + one master. NULL means not in any + list, ERR_PTR(-ENODEV) means end of + list */ + struct mutex rpm_lock; /* for runtime pm of all sysmmus */ phys_addr_t pgtable;/* assigned page table structure */ unsigned int version; /* our version */ struct iommu_device iommu; /* IOMMU core handle */ }; +/* Helper to iterate over all SYSMMUs for a given platform device */ +#define for_each_sysmmu(dev, drvdata) \ + for (drvdata = (dev)->archdata.iommu; \ +drvdata != ERR_PTR(-ENODEV); \ +drvdata = drvdata->next) + static struct exynos_iommu_domain *to_exynos_domain(struct iommu_domain *dom) { return container_of(dom, struct exynos_iommu_domain, domain); @@ -624,6 +622,8 @@ static int exynos_sysmmu_probe(struct platform_device *pdev) data->sysmmu = dev; spin_lock_init(>lock); + data->next = NULL; + mutex_init(>rpm_lock); ret = iommu_device_sysfs_add(>iommu, >dev, NULL, dev_name(data->sysmmu)); @@ -668,17 +668,20 @@ static int __maybe_unused exynos_sysmmu_suspend(struct device *dev) { struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; + struct sysmmu_drvdata *master_data; - if (master) { - struct exynos_iommu_owner *owner = master->archdata.iommu; + if (!master) + return 0; - mutex_lock(>rpm_lock); - if (data->domain) { - dev_dbg(data->sysmmu, "saving state\n"); - __sysmmu_disable(data); - } - mutex_unlock(>rpm_lock); + master_data = master->archdata.iommu; + + mutex_lock(_data->rpm_lock); + if (data->domain) { + dev_dbg(data->sysmmu, "saving state\n"); + __sysmmu_disable(data); } + mutex_unlock(_data->rpm_lock); + return 0; } @@ -686,17 +689,20 @@ static int __maybe_unused exynos_sysmmu_resume(struct device *dev) { struct sysmmu_drvdata *data = dev_get_drvdata(dev); struct device *master = data->master; + struct sysmmu_drvdata *master_data; - if (master) { - struct exynos_iommu_owner *owner = master->archdata.iommu; + if (!master) + return 0; - mutex_lock(>rpm_lock); - if (data->domain) { -
Re: [PATCH v11 10/10] iommu/vt-d: Add custom allocator for IOASID
Hi Jacob, On 4/3/20 8:42 PM, Jacob Pan wrote: > When VT-d driver runs in the guest, PASID allocation must be > performed via virtual command interface. This patch registers a > custom IOASID allocator which takes precedence over the default > XArray based allocator. The resulting IOASID allocation will always > come from the host. This ensures that PASID namespace is system- > wide. > > Signed-off-by: Lu Baolu > Signed-off-by: Liu, Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel-iommu.c | 84 > + > include/linux/intel-iommu.h | 2 ++ > 2 files changed, 86 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 045c5c08d71d..ff3f0386951f 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1732,6 +1732,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu) > if (ecap_prs(iommu->ecap)) > intel_svm_finish_prq(iommu); > } > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) > + ioasid_unregister_allocator(>pasid_allocator); > + > #endif > } > > @@ -3266,6 +3269,84 @@ static int copy_translation_tables(struct intel_iommu > *iommu) > return ret; > } > > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static ioasid_t intel_vcmd_ioasid_alloc(ioasid_t min, ioasid_t max, void > *data) > +{ > + struct intel_iommu *iommu = data; > + ioasid_t ioasid; > + > + if (!iommu) > + return INVALID_IOASID; > + /* > + * VT-d virtual command interface always uses the full 20 bit > + * PASID range. Host can partition guest PASID range based on > + * policies but it is out of guest's control. > + */ > + if (min < PASID_MIN || max > intel_pasid_max_id) > + return INVALID_IOASID; > + > + if (vcmd_alloc_pasid(iommu, )) > + return INVALID_IOASID; > + > + return ioasid; > +} > + > +static void intel_vcmd_ioasid_free(ioasid_t ioasid, void *data) > +{ > + struct intel_iommu *iommu = data; > + > + if (!iommu) > + return; > + /* > + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO > + * We can only free the PASID when all the devices are unbound. > + */ > + if (ioasid_find(NULL, ioasid, NULL)) { > + pr_alert("Cannot free active IOASID %d\n", ioasid); > + return; > + } > + vcmd_free_pasid(iommu, ioasid); > +} > + > +static void register_pasid_allocator(struct intel_iommu *iommu) > +{ > + /* > + * If we are running in the host, no need for custom allocator > + * in that PASIDs are allocated from the host system-wide. > + */ > + if (!cap_caching_mode(iommu->cap)) > + return; > + > + if (!sm_supported(iommu)) { > + pr_warn("VT-d Scalable Mode not enabled, no PASID > allocation\n"); > + return; > + } > + > + /* > + * Register a custom PASID allocator if we are running in a guest, > + * guest PASID must be obtained via virtual command interface. > + * There can be multiple vIOMMUs in each guest but only one allocator > + * is active. All vIOMMU allocators will eventually be calling the same > + * host allocator. > + */ nit: I prefer if (!ecap_vcs(iommu->ecap) || !vccap_pasid(iommu->vccap)) returns; as it removes indents. > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) { > + pr_info("Register custom PASID allocator\n"); > + iommu->pasid_allocator.alloc = intel_vcmd_ioasid_alloc; > + iommu->pasid_allocator.free = intel_vcmd_ioasid_free; > + iommu->pasid_allocator.pdata = (void *)iommu; > + if (ioasid_register_allocator(>pasid_allocator)) { > + pr_warn("Custom PASID allocator failed, scalable mode > disabled\n"); > + /* > + * Disable scalable mode on this IOMMU if there > + * is no custom allocator. Mixing SM capable vIOMMU > + * and non-SM vIOMMU are not supported. > + */ > + intel_iommu_sm = 0; > + } > + } > +} > +#endif > + > static int __init init_dmars(void) > { > struct dmar_drhd_unit *drhd; > @@ -3383,6 +3464,9 @@ static int __init init_dmars(void) >*/ > for_each_active_iommu(iommu, drhd) { > iommu_flush_write_buffer(iommu); > +#ifdef CONFIG_INTEL_IOMMU_SVM > + register_pasid_allocator(iommu); > +#endif > iommu_set_root_entry(iommu); > iommu->flush.flush_context(iommu, 0, 0, 0, > DMA_CCMD_GLOBAL_INVL); > iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index f652db3198d9..e122cb30388e 100644 > --- a/include/linux/intel-iommu.h >
Re: [PATCH v11 08/10] iommu/vt-d: Cache virtual command capability register
Hi Jacob, On 4/3/20 8:42 PM, Jacob Pan wrote: > Virtual command registers are used in the guest only, to prevent > vmexit cost, we cache the capability and store it during initialization. > > Signed-off-by: Jacob Pan > Reviewed-by: Eric Auger > Reviewed-by: Lu Baolu nit: following Joerg's comment on 02/10, may be worth squashing it into the patch that uses it (10/10) Thanks Eric > > --- > v7 Reviewed by Eric & Baolu > --- > > Signed-off-by: Jacob Pan > --- > drivers/iommu/dmar.c| 1 + > include/linux/intel-iommu.h | 5 + > 2 files changed, 6 insertions(+) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c > index 4d6b7b5b37ee..3b36491c8bbb 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -963,6 +963,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 > phys_addr) > warn_invalid_dmar(phys_addr, " returns all ones"); > goto unmap; > } > + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG); > > /* the registers might be more than one page */ > map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap), > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index f775bb825343..e02a96848586 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -194,6 +194,9 @@ > #define ecap_max_handle_mask(e) ((e >> 20) & 0xf) > #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */ > > +/* Virtual command interface capability */ > +#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation > */ > + > /* IOTLB_REG */ > #define DMA_TLB_FLUSH_GRANU_OFFSET 60 > #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60) > @@ -287,6 +290,7 @@ > > /* PRS_REG */ > #define DMA_PRS_PPR ((u32)1) > +#define DMA_VCS_PAS ((u64)1) > > #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ > do { \ > @@ -562,6 +566,7 @@ struct intel_iommu { > u64 reg_size; /* size of hw register set */ > u64 cap; > u64 ecap; > + u64 vccap; > u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */ > raw_spinlock_t register_lock; /* protect register handling */ > int seq_id; /* sequence id of the iommu */ > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] uacce: Remove mm_exit() op
On Thu, Apr 09, 2020 at 05:07:34PM +0800, Zhangfei Gao wrote: > > > On 2020/4/8 下午10:04, Jean-Philippe Brucker wrote: > > The mm_exit() op will be removed from the SVA API. When a process dies > > and its mm goes away, the IOMMU driver won't notify device drivers > > anymore. Drivers should expect to handle a lot more aborted DMA. On the > > upside, it does greatly simplify the queue management. > > > > The uacce_mm struct, that tracks all queues bound to an mm, was only > > used by the mm_exit() callback. Remove it. > > > > Signed-off-by: Jean-Philippe Brucker > Thanks Jean for doing this. > > Tested-by: Zhangfei Gao > > Except one line. > > -static void uacce_mm_put(struct uacce_queue *q) > > +static void uacce_unbind_queue(struct uacce_queue *q) > > { > > - struct uacce_mm *uacce_mm = q->uacce_mm; > > - > > - lockdep_assert_held(>uacce->mm_lock); > > - > > - mutex_lock(_mm->lock); > > - list_del(>list); > > - mutex_unlock(_mm->lock); > > - > > - if (list_empty(_mm->queues)) { > > - if (uacce_mm->handle) > > - iommu_sva_unbind_device(uacce_mm->handle); > > - list_del(_mm->list); > > - kfree(uacce_mm); > > - } > > + if (!q->handle) > > + return; > > + iommu_sva_unbind_device(q->handle); > + q->handle = 0; > > Otherwise iommu_sva_unbind_device maybe called twice. > Since uacce_unbind_queue can be called by uacce_remove and uacce_fops_release. Thanks, I'll add it in v2 Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt wrote: > > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote: > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote: > > > If this code was broken for non-coherent caches a crude powerpc hack > > > isn't going to help anyone else. Remove the hack as it is the last > > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL. > > > > Well Ben added this to make stuff work on ppc, ofc the home grown dma > > layer in drm from back then isn't going to work in other places. I guess > > should have at least an ack from him, in case anyone still cares about > > this on ppc. Adding Ben to cc. > > This was due to some drivers (radeon ?) trying to use vmalloc pages for > coherent DMA, which means on those 4xx powerpc's need to be non-cached. > > There were machines using that (440 based iirc), though I honestly > can't tell if anybody still uses any of it. agp subsystem still seems to happily do that (vmalloc memory for device access), never having been ported to dma apis (or well converted to iommu drivers, which they kinda are really). So I think this all still works exactly as back then, even with the kms radeon drivers. Question really is whether we have users left, and I have no clue about that either. Now if these boxes didn't ever have agp then I think we can get away with deleting this, since we've already deleted the legacy radeon driver. And that one used vmalloc for everything. The new kms one does use the dma-api if the gpu isn't connected through agp. -Daniel > Cheers, > Ben. > > > -Daniel > > > > > > > > Signed-off-by: Christoph Hellwig > > > --- > > > drivers/gpu/drm/drm_scatter.c | 11 +-- > > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c > > > index ca520028b2cb..f4e6184d1877 100644 > > > --- a/drivers/gpu/drm/drm_scatter.c > > > +++ b/drivers/gpu/drm/drm_scatter.c > > > @@ -43,15 +43,6 @@ > > > > > > #define DEBUG_SCATTER 0 > > > > > > -static inline void *drm_vmalloc_dma(unsigned long size) > > > -{ > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) > > > - return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL)); > > > -#else > > > - return vmalloc_32(size); > > > -#endif > > > -} > > > - > > > static void drm_sg_cleanup(struct drm_sg_mem * entry) > > > { > > > struct page *page; > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void > > > *data, > > > return -ENOMEM; > > > } > > > > > > - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); > > > + entry->virtual = vmalloc_32(pages << PAGE_SHIFT); > > > if (!entry->virtual) { > > > kfree(entry->busaddr); > > > kfree(entry->pagelist); > > > -- > > > 2.25.1 > > > > > > > > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
On Thu, Apr 09, 2020 at 09:15:29AM +, Liu, Yi L wrote: > > From: Jean-Philippe Brucker > > Sent: Thursday, April 9, 2020 4:29 PM > > To: Liu, Yi L > > > > On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote: > > > Hi Jean, > > > > > > > From: Jean-Philippe Brucker < jean-phili...@linaro.org > > > > > Sent: Friday, April 3, 2020 4:35 PM > > > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > > > > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote: > > > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data, > > > > > > > > > default: > > > > > > > > > return -EINVAL; > > > > > > > > > } > > > > > > > > > + > > > > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) { > > > > > > > > > > > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me. > > > > > > > > > > > > > > Emm, it's up to the flags to indicate bind what. It was proposed > > > > > > > to > > > > > > > cover the three cases below: > > > > > > > a) BIND/UNBIND_GPASID > > > > > > > b) BIND/UNBIND_GPASID_TABLE > > > > > > > c) BIND/UNBIND_PROCESS > > > > > > > > > > > > > > So it's called VFIO_IOMMU_BIND. > > > > > > > > > > > > but aren't they all about PASID related binding? > > > > > > > > > > yeah, I can rename it. :-) > > > > > > > > I don't know if anyone intends to implement it, but SMMUv2 supports > > > > nesting translation without any PASID support. For that case the name > > > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more > > sense. > > > > Ideally we'd also use a neutral name for the IOMMU API instead of > > > > bind_gpasid(), but that's easier to change later. > > > > > > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it > > > may > > > cause confusion when thinking about VFIO_SET_IOMMU. How about using > > > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another > > > VFIO_BIND_PROCESS in future for the SVA bind case. > > > > I think minimizing the number of ioctls is more important than finding the > > ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then > > rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for > > non-PASID things (they should be rare enough). > maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is > also a discussion on reusing the same ioctl and vfio structure for > pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your > opinion? Merging bind with unbind and alloc with free makes sense. I'd leave at least invalidate a separate ioctl, because alloc/bind/unbind/free are setup functions while invalidate is a runtime thing and performance sensitive. But I can't see a good reason not to merge them all together, so either way is fine by me. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
> From: Jean-Philippe Brucker > Sent: Thursday, April 9, 2020 4:29 PM > To: Liu, Yi L > > On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote: > > Hi Jean, > > > > > From: Jean-Philippe Brucker < jean-phili...@linaro.org > > > > Sent: Friday, April 3, 2020 4:35 PM > > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote: > > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data, > > > > > > > > default: > > > > > > > > return -EINVAL; > > > > > > > > } > > > > > > > > + > > > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) { > > > > > > > > > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me. > > > > > > > > > > > > Emm, it's up to the flags to indicate bind what. It was proposed to > > > > > > cover the three cases below: > > > > > > a) BIND/UNBIND_GPASID > > > > > > b) BIND/UNBIND_GPASID_TABLE > > > > > > c) BIND/UNBIND_PROCESS > > > > > > > > > > > > So it's called VFIO_IOMMU_BIND. > > > > > > > > > > but aren't they all about PASID related binding? > > > > > > > > yeah, I can rename it. :-) > > > > > > I don't know if anyone intends to implement it, but SMMUv2 supports > > > nesting translation without any PASID support. For that case the name > > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more > sense. > > > Ideally we'd also use a neutral name for the IOMMU API instead of > > > bind_gpasid(), but that's easier to change later. > > > > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may > > cause confusion when thinking about VFIO_SET_IOMMU. How about using > > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another > > VFIO_BIND_PROCESS in future for the SVA bind case. > > I think minimizing the number of ioctls is more important than finding the > ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then > rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for > non-PASID things (they should be rare enough). maybe we can start with VFIO_IOMMU_BIND_PASID. Actually, there is also a discussion on reusing the same ioctl and vfio structure for pasid_alloc/free, bind/unbind_gpasid. and cache_inv. how about your opinion? https://lkml.org/lkml/2020/4/3/833 Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote: > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote: > > If this code was broken for non-coherent caches a crude powerpc hack > > isn't going to help anyone else. Remove the hack as it is the last > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL. > > Well Ben added this to make stuff work on ppc, ofc the home grown dma > layer in drm from back then isn't going to work in other places. I guess > should have at least an ack from him, in case anyone still cares about > this on ppc. Adding Ben to cc. This was due to some drivers (radeon ?) trying to use vmalloc pages for coherent DMA, which means on those 4xx powerpc's need to be non-cached. There were machines using that (440 based iirc), though I honestly can't tell if anybody still uses any of it. Cheers, Ben. > -Daniel > > > > > Signed-off-by: Christoph Hellwig > > --- > > drivers/gpu/drm/drm_scatter.c | 11 +-- > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c > > index ca520028b2cb..f4e6184d1877 100644 > > --- a/drivers/gpu/drm/drm_scatter.c > > +++ b/drivers/gpu/drm/drm_scatter.c > > @@ -43,15 +43,6 @@ > > > > #define DEBUG_SCATTER 0 > > > > -static inline void *drm_vmalloc_dma(unsigned long size) > > -{ > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE) > > - return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL)); > > -#else > > - return vmalloc_32(size); > > -#endif > > -} > > - > > static void drm_sg_cleanup(struct drm_sg_mem * entry) > > { > > struct page *page; > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void > > *data, > > return -ENOMEM; > > } > > > > - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT); > > + entry->virtual = vmalloc_32(pages << PAGE_SHIFT); > > if (!entry->virtual) { > > kfree(entry->busaddr); > > kfree(entry->pagelist); > > -- > > 2.25.1 > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] uacce: Remove mm_exit() op
On 2020/4/8 下午10:04, Jean-Philippe Brucker wrote: The mm_exit() op will be removed from the SVA API. When a process dies and its mm goes away, the IOMMU driver won't notify device drivers anymore. Drivers should expect to handle a lot more aborted DMA. On the upside, it does greatly simplify the queue management. The uacce_mm struct, that tracks all queues bound to an mm, was only used by the mm_exit() callback. Remove it. Signed-off-by: Jean-Philippe Brucker Thanks Jean for doing this. Tested-by: Zhangfei Gao Except one line. -static void uacce_mm_put(struct uacce_queue *q) +static void uacce_unbind_queue(struct uacce_queue *q) { - struct uacce_mm *uacce_mm = q->uacce_mm; - - lockdep_assert_held(>uacce->mm_lock); - - mutex_lock(_mm->lock); - list_del(>list); - mutex_unlock(_mm->lock); - - if (list_empty(_mm->queues)) { - if (uacce_mm->handle) - iommu_sva_unbind_device(uacce_mm->handle); - list_del(_mm->list); - kfree(uacce_mm); - } + if (!q->handle) + return; + iommu_sva_unbind_device(q->handle); + q->handle = 0; Otherwise iommu_sva_unbind_device maybe called twice. Since uacce_unbind_queue can be called by uacce_remove and uacce_fops_release. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
Hi Jean, On 4/9/20 10:14 AM, Jean-Philippe Brucker wrote: > On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: >> Hi Yi, >> >> On 4/7/20 11:43 AM, Liu, Yi L wrote: >>> Hi Jean, >>> From: Jean-Philippe Brucker Sent: Friday, April 3, 2020 4:23 PM To: Auger Eric userspace On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: header = vfio_info_cap_add(caps, sizeof(*nesting_cap), VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct >>> vfio_iommu *iommu, /* nesting iommu type supports PASID requests (alloc/free) */ nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; >>> What is the meaning for ARM? >> >> I think it's just a software capability exposed to userspace, on >> userspace side, it has a choice to use it or not. :-) The reason >> define it and report it in cap nesting is that I'd like to make the >> pasid alloc/free be available just for IOMMU with type >> VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good >> for ARM. We can find a proper way to report the availability. > > Well it is more a question for jean-Philippe. Do we have a system wide > PASID allocation on ARM? We don't, the PASID spaces are per-VM on Arm, so this function should consult the IOMMU driver before setting flags. As you said on patch 3, nested doesn't necessarily imply PASID support. The SMMUv2 does not support PASID but does support nesting stages 1 and 2 for the IOVA space. SMMUv3 support of PASID depends on HW capabilities. So I think this needs to be finer grained: Does the container support: * VFIO_IOMMU_PASID_REQUEST? -> Yes for VT-d 3 -> No for Arm SMMU * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? -> Yes for VT-d 3 -> Sometimes for SMMUv2 -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to PASID tables being in GPA space.) * VFIO_IOMMU_BIND_PASID_TABLE? -> No for VT-d -> Sometimes for SMMUv3 Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. >>> >>> good summary. do you expect to see any >>> + nesting_cap->stage1_formats = formats; >>> as spotted by Kevin, since a single format is supported, rename >> >> ok, I was believing it may be possible on ARM or so. :-) will rename >> it. Yes I don't think an u32 is going to cut it for Arm :( We need to describe all sorts of capabilities for page and PASID tables (granules, GPA size, ASID/PASID size, HW access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean much. I guess we could have a secondary vendor capability for these? >>> >>> Actually, I'm wondering if we can define some formats to stands for a set of >>> capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level >>> page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse >>> the capabilities. >> >> But eventually do we really need all those capability getters? I mean >> can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() >> to detect any mismatch? Definitively the error handling may be heavier >> on userspace but can't we manage. > > I think we need to present these capabilities at boot time, long before > the guest triggers a bind(). For example if the host SMMU doesn't support > 16-bit ASID, we need to communicate that to the guest using vSMMU ID > registers or PROBE properties. Otherwise a bind() will succeed, but if the > guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events > which we'll inject into the guest, for no apparent reason from their > perspective. OK I understand this case as in this situation we may be able to change the way to iommu is exposed to the guest. > > In addition some VMMs may have fallbacks if shared page tables are not > available. They could fall back to a MAP/UNMAP interface, or simply not > present a vIOMMU to the guest. fair enough, there is a need for such capability checker in the mid term. But this patch introduces the capability to check whether system wide PASID alloc is supported and this may not be requested at that stage for the whole vSVM integration? Thanks Eric > > Thanks, > Jean > >> My fear is we end up with an overly >> complex series. This capability getter may be interesting if we can >> switch to a fallback implementation but here I guess we don't have any >> fallback. With smmuv3 nested stage we don't have any fallback solution >> either. For the versions, it is different because the userspace shall be >> able to adapt (or not) to the max version supported by the kernel. >> >> Thanks >> >> Eric >>> >>> Regards, >>> Yi
Re: [PATCH v11 07/10] iommu/vt-d: Add svm/sva invalidate function
Hi Jacob, On 4/3/20 8:42 PM, Jacob Pan wrote: > When Shared Virtual Address (SVA) is enabled for a guest OS via > vIOMMU, we need to provide invalidation support at IOMMU API and driver > level. This patch adds Intel VT-d specific function to implement > iommu passdown invalidate API for shared virtual address. > > The use case is for supporting caching structure invalidation > of assigned SVM capable devices. Emulated IOMMU exposes queue > invalidation capability and passes down all descriptors from the guest > to the physical IOMMU. > > The assumption is that guest to host device ID mapping should be > resolved prior to calling IOMMU driver. Based on the device handle, > host IOMMU driver can replace certain fields before submit to the > invalidation queue. > > --- > v11 - Removed 2D map array, use -EINVAL in granularity lookup array. > Fixed devTLB invalidation granularity mapping. Disregard G=1 case > and use address selective invalidation only. > > v7 review fixed in v10 > --- > > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Liu, Yi L > --- > drivers/iommu/intel-iommu.c | 158 > > 1 file changed, 158 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 94c7993dac6a..045c5c08d71d 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5594,6 +5594,163 @@ static void intel_iommu_aux_detach_device(struct > iommu_domain *domain, > aux_domain_remove_dev(to_dmar_domain(domain), dev); > } > > +/* > + * 2D array for converting and sanitizing IOMMU generic TLB granularity to > + * VT-d granularity. Invalidation is typically included in the unmap > operation > + * as a result of DMA or VFIO unmap. However, for assigned devices guest > + * owns the first level page tables. Invalidations of translation caches in > the > + * guest are trapped and passed down to the host. > + * > + * vIOMMU in the guest will only expose first level page tables, therefore > + * we do not support IOTLB granularity for request without PASID (second > level). > + * > + * For example, to find the VT-d granularity encoding for IOTLB > + * type and page selective granularity within PASID: > + * X: indexed by iommu cache type > + * Y: indexed by enum iommu_inv_granularity > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > + */ > + > +const static int > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = { > + /* > + * PASID based IOTLB invalidation: PASID selective (per PASID), > + * page selective (address granularity) > + */ > + {-EINVAL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > + /* PASID based dev TLBs */ > + {-EINVAL, -EINVAL, QI_DEV_IOTLB_GRAN_PASID_SEL}, > + /* PASID cache */ > + {-EINVAL, -EINVAL, -EINVAL} > +}; > + > +static inline int to_vtd_granularity(int type, int granu) > +{ > + return inv_type_granu_table[type][granu]; > +} > + > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) > +{ > + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > + > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. > + * IOMMU cache invalidate API passes granu_size in bytes, and number of > + * granu size in contiguous memory. > + */ > + return order_base_2(nr_pages); > +} > + > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > + struct device *dev, struct iommu_cache_invalidate_info > *inv_info) > +{ > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + struct intel_iommu *iommu; > + unsigned long flags; > + int cache_type; > + u8 bus, devfn; > + u16 did, sid; > + int ret = 0; > + u64 size = 0; > + > + if (!inv_info || !dmar_domain || > + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) > + return -EINVAL; > + > + if (!dev || !dev_is_pci(dev)) > + return -ENODEV; Check (domain->flags & DOMAIN_FLAG_NESTING_MODE)? > + > + iommu = device_to_iommu(dev, , ); > + if (!iommu) > + return -ENODEV; > + > + spin_lock_irqsave(_domain_lock, flags); > + spin_lock(>lock); > + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn); > + if (!info) { > + ret = -EINVAL; > + goto out_unlock; > + } > + did = dmar_domain->iommu_did[iommu->seq_id]; > + sid = PCI_DEVID(bus, devfn); > + > + /* Size is only valid in non-PASID selective invalidation */ > + if (inv_info->granularity != IOMMU_INV_GRANU_PASID) > + size = to_vtd_size(inv_info->addr_info.granule_size, > +inv_info->addr_info.nb_granules); > + > + for_each_set_bit(cache_type, (unsigned long *)_info->cache, > IOMMU_CACHE_INV_TYPE_NR) { > + int
Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host
On Tue, Apr 07, 2020 at 10:33:25AM +, Liu, Yi L wrote: > Hi Jean, > > > From: Jean-Philippe Brucker < jean-phili...@linaro.org > > > Sent: Friday, April 3, 2020 4:35 PM > > Subject: Re: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host > > > > On Thu, Apr 02, 2020 at 08:05:29AM +, Liu, Yi L wrote: > > > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data, > > > > > > > default: > > > > > > > return -EINVAL; > > > > > > > } > > > > > > > + > > > > > > > + } else if (cmd == VFIO_IOMMU_BIND) { > > > > > > > > > > > > BIND what? VFIO_IOMMU_BIND_PASID sounds clearer to me. > > > > > > > > > > Emm, it's up to the flags to indicate bind what. It was proposed to > > > > > cover the three cases below: > > > > > a) BIND/UNBIND_GPASID > > > > > b) BIND/UNBIND_GPASID_TABLE > > > > > c) BIND/UNBIND_PROCESS > > > > > > > > > > So it's called VFIO_IOMMU_BIND. > > > > > > > > but aren't they all about PASID related binding? > > > > > > yeah, I can rename it. :-) > > > > I don't know if anyone intends to implement it, but SMMUv2 supports > > nesting translation without any PASID support. For that case the name > > VFIO_IOMMU_BIND_GUEST_PGTBL without "PASID" anywhere makes more sense. > > Ideally we'd also use a neutral name for the IOMMU API instead of > > bind_gpasid(), but that's easier to change later. > > I agree VFIO_IOMMU_BIND is somehow not straight-forward. Especially, it may > cause confusion when thinking about VFIO_SET_IOMMU. How about using > VFIO_NESTING_IOMMU_BIND_STAGE1 to cover a) and b)? And has another > VFIO_BIND_PROCESS in future for the SVA bind case. I think minimizing the number of ioctls is more important than finding the ideal name. VFIO_IOMMU_BIND was fine to me, but if it's too vague then rename it to VFIO_IOMMU_BIND_PASID and we'll just piggy-back on it for non-PASID things (they should be rare enough). Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace
On Wed, Apr 08, 2020 at 12:27:58PM +0200, Auger Eric wrote: > Hi Yi, > > On 4/7/20 11:43 AM, Liu, Yi L wrote: > > Hi Jean, > > > >> From: Jean-Philippe Brucker > >> Sent: Friday, April 3, 2020 4:23 PM > >> To: Auger Eric > >> userspace > >> > >> On Wed, Apr 01, 2020 at 03:01:12PM +0200, Auger Eric wrote: > >>header = vfio_info_cap_add(caps, sizeof(*nesting_cap), > >> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, > >> 1); > >> @@ -2254,6 +2309,7 > >> @@ static int vfio_iommu_info_add_nesting_cap(struct > > vfio_iommu *iommu, > >>/* nesting iommu type supports PASID requests > >> (alloc/free) */ > >>nesting_cap->nesting_capabilities |= > >> VFIO_IOMMU_PASID_REQS; > > What is the meaning for ARM? > > I think it's just a software capability exposed to userspace, on > userspace side, it has a choice to use it or not. :-) The reason > define it and report it in cap nesting is that I'd like to make the > pasid alloc/free be available just for IOMMU with type > VFIO_IOMMU_TYPE1_NESTING. Please feel free tell me if it is not good > for ARM. We can find a proper way to report the availability. > >>> > >>> Well it is more a question for jean-Philippe. Do we have a system wide > >>> PASID allocation on ARM? > >> > >> We don't, the PASID spaces are per-VM on Arm, so this function should > >> consult the > >> IOMMU driver before setting flags. As you said on patch 3, nested doesn't > >> necessarily imply PASID support. The SMMUv2 does not support PASID but does > >> support nesting stages 1 and 2 for the IOVA space. > >> SMMUv3 support of PASID depends on HW capabilities. So I think this needs > >> to be > >> finer grained: > >> > >> Does the container support: > >> * VFIO_IOMMU_PASID_REQUEST? > >> -> Yes for VT-d 3 > >> -> No for Arm SMMU > >> * VFIO_IOMMU_{,UN}BIND_GUEST_PGTBL? > >> -> Yes for VT-d 3 > >> -> Sometimes for SMMUv2 > >> -> No for SMMUv3 (if we go with BIND_PASID_TABLE, which is simpler due to > >> PASID tables being in GPA space.) > >> * VFIO_IOMMU_BIND_PASID_TABLE? > >> -> No for VT-d > >> -> Sometimes for SMMUv3 > >> > >> Any bind support implies VFIO_IOMMU_CACHE_INVALIDATE support. > > > > good summary. do you expect to see any > > > >> > >> + nesting_cap->stage1_formats = formats; > > as spotted by Kevin, since a single format is supported, rename > > ok, I was believing it may be possible on ARM or so. :-) will rename > it. > >> > >> Yes I don't think an u32 is going to cut it for Arm :( We need to describe > >> all sorts of > >> capabilities for page and PASID tables (granules, GPA size, ASID/PASID > >> size, HW > >> access/dirty, etc etc.) Just saying "Arm stage-1 format" wouldn't mean > >> much. I > >> guess we could have a secondary vendor capability for these? > > > > Actually, I'm wondering if we can define some formats to stands for a set of > > capabilities. e.g. VTD_STAGE1_FORMAT_V1 which may indicates the 1st level > > page table related caps (aw, a/d, SRE, EA and etc.). And vIOMMU can parse > > the capabilities. > > But eventually do we really need all those capability getters? I mean > can't we simply rely on the actual call to VFIO_IOMMU_BIND_GUEST_PGTBL() > to detect any mismatch? Definitively the error handling may be heavier > on userspace but can't we manage. I think we need to present these capabilities at boot time, long before the guest triggers a bind(). For example if the host SMMU doesn't support 16-bit ASID, we need to communicate that to the guest using vSMMU ID registers or PROBE properties. Otherwise a bind() will succeed, but if the guest uses 16-bit ASIDs in its CD, DMA will result in C_BAD_CD events which we'll inject into the guest, for no apparent reason from their perspective. In addition some VMMs may have fallbacks if shared page tables are not available. They could fall back to a MAP/UNMAP interface, or simply not present a vIOMMU to the guest. Thanks, Jean > My fear is we end up with an overly > complex series. This capability getter may be interesting if we can > switch to a fallback implementation but here I guess we don't have any > fallback. With smmuv3 nested stage we don't have any fallback solution > either. For the versions, it is different because the userspace shall be > able to adapt (or not) to the max version supported by the kernel. > > Thanks > > Eric > > > > Regards, > > Yi Liu > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 05/10] iommu/vt-d: Add bind guest PASID support
Hi Jacob, On 4/3/20 8:42 PM, Jacob Pan wrote: > When supporting guest SVA with emulated IOMMU, the guest PASID > table is shadowed in VMM. Updates to guest vIOMMU PASID table > will result in PASID cache flush which will be passed down to > the host as bind guest PASID calls. > > For the SL page tables, it will be harvested from device's > default domain (request w/o PASID), or aux domain in case of > mediated device. > > .-. .---. > | vIOMMU| | Guest process CR3, FL only| > | | '---' > ./ > | PASID Entry |--- PASID cache flush - > '-' | > | | V > | |CR3 in GPA > '-' > Guest > --| Shadow |--| > vv v > Host > .-. .--. > | pIOMMU| | Bind FL for GVA-GPA | > | | '--' > ./ | > | PASID Entry | V (Nested xlate) > '\.--. > | | |SL for GPA-HPA, default domain| > | | '--' > '-' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > > --- > v11: Fixed locking, avoid duplicated paging mode check, added helper to > free svm if device list is empty. Use rate limited error message since > the bind gpasid call comes from user space. > --- > > Signed-off-by: Jacob Pan > Signed-off-by: Liu, Yi L > --- > drivers/iommu/intel-iommu.c | 4 + > drivers/iommu/intel-svm.c | 206 > > include/linux/intel-iommu.h | 8 +- > include/linux/intel-svm.h | 17 > 4 files changed, 234 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index c0dadec5a6b3..94c7993dac6a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -6178,6 +6178,10 @@ const struct iommu_ops intel_iommu_ops = { > .dev_disable_feat = intel_iommu_dev_disable_feat, > .is_attach_deferred = intel_iommu_is_attach_deferred, > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > +#ifdef CONFIG_INTEL_IOMMU_SVM > + .sva_bind_gpasid= intel_svm_bind_gpasid, > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > +#endif > }; > > static void quirk_iommu_igfx(struct pci_dev *dev) > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index d7f2a5358900..7cf711318b87 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -226,6 +226,212 @@ static LIST_HEAD(global_svm_list); > list_for_each_entry((sdev), &(svm)->devs, list) \ > if ((d) != (sdev)->dev) {} else > > + > +static inline void intel_svm_free_if_empty(struct intel_svm *svm, u64 pasid) > +{ > + if (list_empty(>devs)) { > + ioasid_set_data(pasid, NULL); > + kfree(svm); > + } > +} > + > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, > + struct iommu_gpasid_bind_data *data) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct dmar_domain *dmar_domain; > + struct intel_svm_dev *sdev; > + struct intel_svm *svm; > + int ret = 0; > + > + if (WARN_ON(!iommu) || !data) > + return -EINVAL; > + > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > + return -EINVAL; > + > + if (dev_is_pci(dev)) { > + /* VT-d supports devices with full 20 bit PASIDs only */ > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) > + return -EINVAL; > + } else { > + return -ENOTSUPP; > + } > + > + /* > + * We only check host PASID range, we have no knowledge to check > + * guest PASID range. > + */ > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX) > + return -EINVAL; > + > + dmar_domain = to_dmar_domain(domain); > + > + mutex_lock(_mutex); > + svm = ioasid_find(NULL, data->hpasid, NULL); > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + > + if (svm) { > + /* > + * If we found svm for the PASID, there must be at > + * least one device bond, otherwise svm should be freed. > + */ > + if (WARN_ON(list_empty(>devs))) { > + ret = -EINVAL; > + goto out; > + } > + > + for_each_svm_dev(sdev, svm, dev) { > + /* In case of multiple sub-devices of the same pdev >
Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote: > On Tue, 7 Apr 2020 21:00:21 -0700 > "Raj, Ashok" wrote: > > > Hi Alex > > > > + Bjorn > > + Don > > > FWIW I can't understand why PCI SIG went different ways with ATS, > > where its enumerated on PF and VF. But for PASID and PRI its only > > in PF. > > > > I'm checking with our internal SIG reps to followup on that. > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote: > > > > Is there vendor guarantee that hidden registers will locate at the > > > > same offset between PF and VF config space? > > > > > > I'm not sure if the spec really precludes hidden registers, but the > > > fact that these registers are explicitly outside of the capability > > > chain implies they're only intended for device specific use, so I'd say > > > there are no guarantees about anything related to these registers. > > > > As you had suggested in the other thread, we could consider > > using the same offset as in PF, but even that's a better guess > > still not reliable. > > > > The other option is to maybe extend driver ops in the PF to expose > > where the offsets should be. Sort of adding the quirk in the > > implementation. > > > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is > > resisting > > making VF's first class citizen, we might ask them to add some verbiage > > to suggest leave the same offsets as PF open to help emulation software. > > Even if we know where to expose these capabilities on the VF, it's not > clear to me how we can actually virtualize the capability itself. If > the spec defines, for example, an enable bit as r/w then software that > interacts with that register expects the bit is settable. There's no > protocol for "try to set the bit and re-read it to see if the hardware > accepted it". Therefore a capability with a fixed enable bit > representing the state of the PF, not settable by the VF, is > disingenuous to the spec. Would it be OK to implement a lock down mechanism for the PF PASID capability, preventing changes to the PF cap when the VF is in use by VFIO? The emulation would still break the spec: since the PF cap would always be enabled the VF configuration bits would have no effect, but it seems preferable to having the Enable bit not enable anything. > > If what we're trying to do is expose that PASID and PRI are enabled on > the PF to a VF driver, maybe duplicating the PF capabilities on the VF > without the ability to control it is not the right approach. Maybe we > need new capabilities exposing these as slave features that cannot be > controlled? We could define our own vendor capability for this, but of > course we have both the where to put it in config space issue, as well > as the issue of trying to push an ad-hoc standard. vfio could expose > these as device features rather than emulating capabilities, but that > still leaves a big gap between vfio in the hypervisor and the driver in > the guest VM. That might still help push the responsibility and policy > for how to expose it to the VM as a userspace problem though. Userspace might have more difficulty working around the issues mentioned in this thread. They would still need a guarantee that the PF PASID configuration doesn't change at runtime, and they wouldn't have the ability to talk to a vendor driver to figure out where to place the fake PASID capability. Thanks, Jean > > I agree though, I don't know why the SIG would preclude implementing > per VF control of these features. Thanks, > > Alex > > > > FWIW, vfio started out being more strict about restricting config space > > > access to defined capabilities, until... > > > > > > commit a7d1ea1c11b33bda2691f3294b4d735ed635535a > > > Author: Alex Williamson > > > Date: Mon Apr 1 09:04:12 2013 -0600 > > > > > > > Cheers, > > Ashok > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote: > This is always GFP_KERNEL - for long term mappings with other properties > vmap should be used. > > Signed-off-by: Christoph Hellwig > --- > drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 2 +- > drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3 +-- > drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 +-- > fs/erofs/decompressor.c| 2 +- For EROFS part, Acked-by: Gao Xiang Thanks, Gao Xiang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu: Remove iommu_sva_ops::mm_exit()
On Wed, Apr 08, 2020 at 04:48:02PM -0700, Jacob Pan wrote: > On Wed, 8 Apr 2020 19:32:18 -0300 > Jason Gunthorpe wrote: > > > On Wed, Apr 08, 2020 at 02:35:52PM -0700, Jacob Pan wrote: > > > > On Wed, Apr 08, 2020 at 11:35:52AM -0700, Jacob Pan wrote: > > > > > Hi Jean, > > > > > > > > > > On Wed, 8 Apr 2020 16:04:25 +0200 > > > > > Jean-Philippe Brucker wrote: > > > > > > > > > > > The IOMMU SVA API currently requires device drivers to > > > > > > implement an mm_exit() callback, which stops device jobs that > > > > > > do DMA. This function is called in the release() MMU > > > > > > notifier, when an address space that is shared with a device > > > > > > exits. > > > > > > > > > > > > It has been noted several time during discussions about SVA > > > > > > that cancelling DMA jobs can be slow and complex, and doing > > > > > > it in the release() notifier might cause synchronization > > > > > > issues (patch 2 has more background). Device drivers must in > > > > > > any case call unbind() to remove their bond, after stopping > > > > > > DMA from a more favorable context (release of a file > > > > > > descriptor). > > > > > > > > > > > > So after mm exits, rather than notifying device drivers, we > > > > > > can hold on to the PASID until unbind(), ask IOMMU drivers to > > > > > > silently abort DMA and Page Requests in the meantime. This > > > > > > change should relieve the mmput() path. > > > > > > > > > > I assume mm is destroyed after all the FDs are closed > > > > > > > > FDs do not hold a mmget(), but they may hold a mmgrab(), ie > > > > anything using mmu_notifiers has to hold a grab until the > > > > notifier is destroyed, which is often triggered by FD close. > > > > > > > Sorry, I don't get this. Are you saying we have to hold a mmgrab() > > > between svm_bind/mmu_notifier_register and > > > svm_unbind/mmu_notifier_unregister? > > > > Yes. This is done automatically for the caller inside the mmu_notifier > > implementation. We now even store the mm_struct pointer inside the > > notifier. > > > > Once a notifier is registered the mm_struct remains valid memory until > > the notifier is unregistered. > > > > > Isn't the idea of mmu_notifier is to avoid holding the mm reference > > > and rely on the notifier to tell us when mm is going away? > > > > The notifier only holds a mmgrab(), not a mmget() - this allows > > exit_mmap to proceed, but the mm_struct memory remains. > > > > This is also probably why it is a bad idea to tie the lifetime of > > something like a pasid to the mmdrop as a evil user could cause a > > large number of mm structs to be released but not freed, probably > > defeating cgroup limits and so forth (not sure) > > > > > It seems both Intel and AMD iommu drivers don't hold mmgrab after > > > mmu_notifier_register. > > > > It is done internally to the implementation. > > > > > > So the exit_mmap() -> release() may happen before the FDs are > > > > destroyed, but the final mmdrop() will be during some FD release > > > > when the final mmdrop() happens. > > > > > > Do you mean mmdrop() is after FD release? > > > > Yes, it will be done by the mmu_notifier_unregister(), which should be > > called during FD release if the iommu lifetime is linked to some FD. > > > > > If so, unbind is called in FD release should take care of > > > everything, i.e. stops DMA, clear PASID context on IOMMU, flush PRS > > > queue etc. > > > > Yes, this is the proper way, when the DMA is stopped and no use of the > > PASID remains then you can drop the mmu notifier and release the PASID > > entirely. If that is linked to the lifetime of the FD then forget > > completely about the mm_struct lifetime, it doesn't matter.. > > > Got everything above, thanks a lot. > > If everything is in order with the FD close. Why do we need to > "ask IOMMU drivers to silently abort DMA and Page Requests in the > meantime." in mm_exit notifier? This will be done orderly in unbind > anyway. When the process is killed, mm release can happen before fds are released. If you look at do_exit() in kernel/exit.c: exit_mm() mmput() -> mmu release notifier ... exit_files() close_files() fput() exit_task_work() __fput() -> unbind() Thanks, Jean > > > > Enforcing unbind upon FD close might be a precarious path, perhaps > > > that is why we have to deal with out of order situation? > > > > How so? You just put it in the FD release function :) > > > I was thinking some driver may choose to defer unbind in some workqueue > etc. > > > > > > In VT-d, because of enqcmd and lazy PASID free we plan to hold > > > > > on to the PASID until mmdrop. > > > > > https://lore.kernel.org/patchwork/patch/1217762/ > > > > > > > > Why? The bind already gets a mmu_notifier which has refcounts and > > > > the right lifetime for PASID.. This code could already be > > > > simplified by using the